Skip to content
Mar 10 10

Why I Don’t Program Much Anymore

by karlkatzke

There’s been some great discussions about the state of programming. Confession: I’m much more of a sysadmin and architecture guy than anything else at this point. If it doesn’t have a quick configuration file or a GUI, at this point, I don’t do much with it because I don’t have the time to learn everything. That’s even after focusing our core web environment on two technologies (php/python) and doing our best to reject anything that doesn’t fit into them.

Here’s the first one: Whatever Happened to Programming @ The Reinvigorated Programmer, and here’s it’s second part: It May Not Be As Bad as All That.

Pay special attention to the addendum in that second article. The money quote for me was in the big pull from a comment by jdeitrich on HackerNews:

We talk about ‘flow’ quite a lot in software and I just have to wonder what’s happening to us all in that respect. Just like a conversation becomes stilted if the speakers keep having to refer to their phrasebooks and dictionaries, I wonder how much longer it will be possible to retain any sort of flowful state when writing software. Might the idea of mastery disappear forever under a constant torrent of new tools and technologies?

It’s the death of the hobbyist programmer. There’s a new framework release in Symfony or Zend Framework every time I re-surface a week or two later. Even with 10 years experience with programming, unit tests, and a decent level of comfort from the experience with 0.x versions and up of these frameworks, I spend all the time I *should* be coding with my nose in the docs updating code that’s been deprecated or migrated. Just keeping up in one framework can be a full time job.

How can anything get done like this?

Mar 7 10

Flux for OSX – Ding Dong, Dreamweaver’s Dead

by karlkatzke

I’m not interested in any of the apps in this current Mac Sale, but it did lead me to a page about Flux, which might be my next go-to for WYSIWYG HTML editing.

Feb 24 10

WordPress 2.9 and ACLs

by karlkatzke

WordPress 2.9 changed the permission structure away from the permission-based ACL, which confused many users, and created a role-based ACL where roles have permissions. This has royally fubared a few of my sites, which used extensive ACL settings with some custom plugins to enable fine-grained permissions. On the other hand, few people understood the old permission format, things were complicated enough that a user could trip over themselves and inadvertently grant multiple contradictory permissions to someone, and it was difficult to teach and explain the administrative interfaces.

The first step towards straightening out the new permissions structure is creating and/or changing the existing roles. Steph over at SillyBean has a good article on creating roles in the PHP code, and also mentions Justin Tallock’s Members Plugin, which automates a bunch of the things that she explains. Of course, there’s always the WordPress Documentation, and the reasoning behind the changes are in this WordPress Trac ticket.

Feb 22 10

Backing Up Two Ways from Sunday

by karlkatzke

One method of backup or recovery isn’t enough. Period. No matter what anyone tells you, what the book says, what your boss says, or what you think you need, you need to be backing things up in many ways.

Here’s a few examples.

MySQL

Theoretically, you could recover anything you needed from the binary log, as long as you’ve got a good starting point and a good ending point. (This, by the way, is a good reason to flush the binary logs and take a backup on a regular basis.) What if your binary log’s corrupted, though? You need to fall back to a full SQL backup … which you’re doing regularly, right?

If your binary log is corrupted, any mirrors you are using that are based on that binary log are corrupted as well.

Case in point: I had a client with a very active, very large database… north of 15GB in InnoDB. The binary log hit a bug and corrupted itself. The backups were being done from that mirror so that they didn’t interrupt the main machine’s processing, but they only kept a few days worth, so we couldn’t use those backups to restore. The most recent un-corrupted dump from the main machine had been taken three months before. Luckily, the client had done some application-level backups to an XML format, and we were able to (laboriously) restore from that. It cost about $3,000 because they didn’t want to degrade their forum’s performance for a half hour every night and pay for an extra TB or storage or so to keep more than a few days worth of upgrades.

Servers

Scenario: Hard drive gets corrupted or dies. You need to get the machine back up quickly. You have a snapshot of the machine … but your snapshot is on the same storage as that machine unless you back it up somewhere else.

On top of that, storage requirements have been growing rapidly for servers. Where a linux server take less than 1GB, Windows 2008R2 can take up 20GB with system files alone. (In fact, if you plan to have any data on that server, or keep any logs, we’d recommend going with 40GB minimum for your C: drive.) It’s important to back that up to something that’s not on the same system disks.

Better yet, take a hint from the application-level backups — and back up your registry, configuration files, and data separately from the snapshot. We tend to use RSync for this role and put it in a rolling-backup mode with the –link-dest option to ease recovery.

VMWare

Same principle as above. Snapshots are usually stored in the same datastore. Datastore goes bye-bye, so do your snapshots.

There’s some great products out there that can really help with this issue. The one we use is VEEAM Replication and Backup. It can be used to replicate a snapshot to another VMWare cluster, or back up the datastore files at a consistent snapshot point and then copy them elsewhere all in one step. We use a two-step process — we keep them locally on the backup server and also transmit them to another datacenter across campus.

When using VEEAM with Windows, make sure that VMWare Tools is installed and that you enable the VSS integration. (You’ll also need to make sure that the administrative share option on the system drives are enabled, and that the appropriate firewall ports are opened.) This ensures that you’ve got a transactionally consistent backup snapshot.

Practice, practice, practice

The only way to make sure that you can recover from a disaster is to test recovering from a disaster. At least once a year, we practice recovering from a worst-case scenario. That means bringing up a new machine from scratch, re-implementing all of the options and configurations, and then restoring the data. Despite that kind of restoration being something that should never happen, it does — and practice gives you insights into how to improve the processes and turns a recovery operation from an expensive nightmare that sets back all of your other processes into something that you can execute quickly and professionally.

Feb 13 10

Buzz not worth the Buzz

by karlkatzke

Besides the obvious, the thing that pisses me off most about Google Buzz is having to mark things read twice — once in google reader, once in Buzz. Still experimenting to see if I can hide/unfollow people in Google Reader and not have them unfollowed in Buzz, or vice versa.

Feb 10 10

Sun/Oracle Merger

by karlkatzke

I’m happy to see it; I’m happy to be involved in it.

Sun has some of the best ideas in the world. From a creativity point of view, they’re pretty amazing. From an implementation point of view, with some notable exceptions (ex: Fishworks), they’re pitiful. Sun couldn’t get laid in a whorehouse wearing a suit made of hundred dollar bills.

Half of Sun’s ideas were half-baked. (Either go fully baked, a’la Steve Jobs, or lay off whatever writes you make bad haiku, mmkay?) The x45xx line of servers is a wonderful idea and a wonderful form factor, and Sun overcame significant engineering challenges to develop it. Unfortunately, the first gen fell down hard under load and were practically unusable. The second gen is still suffering from some high replacement part and add-on costs that don’t justify the price in many cases. The integration of ZFS and SSDs as ZIL/L2ARC is wonderful, but there are a ton of technical problems that customers keep running into and Sun keeps refusing to acknowledge. It took three months to solve the problem I was having with SSDs and ZIL. I place the blame for the former on poor management controls, and the latter on excessive outsourcing of core competencies. Both are failures of management to execute the brilliant ideas that engineers come with.

It’s nice to see a company with a reputation for being able to execute and capitalize on new ideas come in. Oracle’s already started to cut, and all of the cuts I know of so far in my various interactions with the company have been well-justified. I’m really excited, from the point of view of someone with several relationships with the company, to see what comes of this merger.

Jan 21 10

Why Redhat’s Losing Market Share

by karlkatzke

Check this out:

RHN Fail

RHN Fail

Yeah, that’s what you see when you visit rhn.redhat.com — which you need to use to administer redhat subscriptions. I can’t get my servers to subscribe while the site’s down, and I can’t manage my entitlements or buy new ones.

One of my consulting projects has been on hold for days while RHN sorts itself out. Worse, you can’t even log in to report the problem. If you click on the “contacting us” link, you get taken to a page with a couple of mailing lists. Well, why join a mailing list? I know the site’s down. I want to file an engineering report. I click the last option, which is supposed to allow me to file such a report. It says I need to log in to file a report. FAIL.

It does seem that there’s some awareness of the problem. Poking around in the rest of the redhat.com domain, I got messages like this:

Screen shot 2010-01-21 at 3.33.32 AM

Why’s Redhat losing market share? They can’t even run a website well. Who’s going to trust their server distro when they can’t get a website right?

Nov 23 09

LSI Logic SAS Driver for VMWare vSphere 4

by karlkatzke

If you’re using raw access to storage LUNs with VMWare, and you’re using Windows, you can use the LSI Logic SAS virtual SCSI adapter option and create virtual drives. This is better than using the Microsoft iSCSI initiator because you can edit the drive mappings with the machine powered off and you can clone the machine and easily redirect all of your storage before powering the machine on.

The correct driver to be using is the LSI SAS 1068 driver. You’ll need to make a floppy image using an image tool — if you’ve access to a linux box, just use DD to create the image and then mount it and write files to it. If you’re on Windows, the venerable WinImage and other utilities exist. Either way, you’ll need to rename the file with a .flp extension and mount it on boot in your Windows VM in order to load the driver to see the drives.

I’ve been getting some pretty darned good performance with that and iSCSI LUNs on my Solaris server. I haven’t (yet) put together a decent test and some metrics to back it, but the machines on raw device LUNs feel a *lot* snappier than the machines that are on a 400GB VMFS. A good basic tutorial with iSCSI and ZFS is here: Running ZFS Over iSCSI as a vmware VMFS store — but note that I’m using raw LUNs after not being happy with the VMFS performance with a half-dozen hosts doing heavy I/O.

Nov 19 09

VMWare explorations…

by karlkatzke

GhettoVCB – VCB for free. Doesn’t get better than that.

Understanding VMWare Snapshots – Also, it’s probably a good idea to learn this stuff.

Nov 10 09

Biggest Problem in Airbus A380: Software

by karlkatzke

Software problems are the #1 thing that will keep an Airbus A380 on the ground. Yes, airplanes are complicated things … but at the same time, not much is required to keep most of them in the air.

The thing that speaks volumes to me about these problems are a few key quotes.

Clark says that the problem with the nuisance warnings has been their diverse nature, but “the common thread” is the software. He says Airbus executive vice-president programmes Tom Williams and his team “have sat in my office many times and said they can’t identify trends, which is the worst possible thing”.

Clark blames the software’s design. “There was a philosophy of utopia – I suspect that Airbus was blessed with some boffins who said ‘we’ve got to make this absolutely perfect – no flexibility’. The slightest surge causes one [sensor] to trip and then six more as they’re all linked,” he says.

Anyone willing to take guesses about the type of architecture and software developers at Airbus?