<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Karl Katzke &#187; sysadmin</title>
	<atom:link href="http://www.karlkatzke.com/categories/linux/sysadmin/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.karlkatzke.com</link>
	<description>Geek of the Week</description>
	<lastBuildDate>Fri, 16 Dec 2011 19:54:18 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3</generator>
		<item>
		<title>RHEL 5 supports XFS out of box</title>
		<link>http://www.karlkatzke.com/rhel-5-supports-xfs-out-of-box/</link>
		<comments>http://www.karlkatzke.com/rhel-5-supports-xfs-out-of-box/#comments</comments>
		<pubDate>Thu, 26 May 2011 23:23:00 +0000</pubDate>
		<dc:creator>karlkatzke</dc:creator>
				<category><![CDATA[linux]]></category>
		<category><![CDATA[sysadmin]]></category>

		<guid isPermaLink="false">http://www.karlkatzke.com/?p=687</guid>
		<description><![CDATA[I was trying to figure out how to get XFS working on a RHEL box at work so that we could store more than 16TB on a filesystem, and found Gianpolo Del Matto&#8217;s excellent XFS on RHEL tutorial. And then I found out that most of that mucking around in the kernel build process isn&#8217;t [...]]]></description>
			<content:encoded><![CDATA[<p>I was trying to figure out how to get XFS working on a RHEL box at work so that we could store more than 16TB on a filesystem, and found <a href="http://phaq.phunsites.net/2008/02/04/enabling-reiserfs-xfs-jfs-on-redhat-enterprise-linux/">Gianpolo Del Matto&#8217;s excellent XFS on RHEL tutorial</a>. </p>
<p>And then I found out that most of that mucking around in the kernel build process isn&#8217;t necessary. </p>
<p>RHEL5 u6 (at least my copies &#8212; note that I have the -xen kernel packages installed, which might affect things) actually has the XFS kernel module in /lib/modules/kernel/fs/xfs &#8212; it just doesn&#8217;t have the xfsprogs package, and the xfsprogs package is not available via any supported means. I downloaded the xfsprogs srpm from <a href="ftp://oss.sgi.com/projects/xfs/cmd_srpms-oct_09/">the XFS project page</a> and used rpmbuild to build and install it myself. </p>
<p>Does anyone else in this day and age find it ridiculous that Redhat does not support filesystems larger than 16TB without some form of hackery? I can buy that much disk storage at my local Office Depot. </p>
<p>Note that I&#8217;m pretty sure that the mount option &#8216;inode64&#8242; is required if you&#8217;re over 16TB, and that 32-bit NFS clients do not like the inode64 option one tiny bit. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.karlkatzke.com/rhel-5-supports-xfs-out-of-box/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Gawker reminds us all of why security is important</title>
		<link>http://www.karlkatzke.com/gawker-reminds-us-all-of-why-security-is-important/</link>
		<comments>http://www.karlkatzke.com/gawker-reminds-us-all-of-why-security-is-important/#comments</comments>
		<pubDate>Mon, 13 Dec 2010 20:43:22 +0000</pubDate>
		<dc:creator>karlkatzke</dc:creator>
				<category><![CDATA[sysadmin]]></category>

		<guid isPermaLink="false">http://www.karlkatzke.com/?p=620</guid>
		<description><![CDATA[This Forbes article on the recent Gawker hack is well researched and written. And if you weren&#8217;t aware of the hack, now you are. (You can go look up your email&#8217;s md5 hash in this google doc to find out if you&#8217;ve been compromised.) One nitpick about the article: They pointed out that the Gawker [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://blogs.forbes.com/firewall/2010/12/13/the-lessons-of-gawkers-security-mess/">This Forbes article on the recent Gawker hack</a> is well researched and written. And if you weren&#8217;t aware of the hack, now you are. (You can go look up your <a href="http://pajhome.org.uk/crypt/md5/">email&#8217;s md5 hash</a> in <a href="http://www.google.com/fusiontables/DataSource?dsrcid=350662">this google doc</a> to find out if you&#8217;ve been compromised.) </p>
<p>One nitpick about the article: They pointed out that the Gawker servers were running Linux 2.6.18 &#8212; that likely means that they&#8217;re running Redhat Enterprise Linux, or CentOS, or a RHEL derivative. RHEL&#8217;s current kernel is 2.6.18-194.26.1 &#8212; the high final numbers mean that they&#8217;ve done a lot of backporting for security reasons. Of course, for those of us tied to RHEL, it serves as a reminder of why we should consider something a little fresher&#8230; </p>
<p>And man, doesn&#8217;t that make Gawker look like a bunch of incompetent dicks? You&#8217;ve got the iPhone thing, and now this&#8230; </p>
]]></content:encoded>
			<wfw:commentRss>http://www.karlkatzke.com/gawker-reminds-us-all-of-why-security-is-important/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Building out a small datacenter</title>
		<link>http://www.karlkatzke.com/building-out-a-small-datacenter/</link>
		<comments>http://www.karlkatzke.com/building-out-a-small-datacenter/#comments</comments>
		<pubDate>Tue, 30 Nov 2010 05:14:36 +0000</pubDate>
		<dc:creator>karlkatzke</dc:creator>
				<category><![CDATA[centos]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[sysadmin]]></category>
		<category><![CDATA[vmware]]></category>
		<category><![CDATA[consulting]]></category>

		<guid isPermaLink="false">http://www.karlkatzke.com/?p=613</guid>
		<description><![CDATA[For one of my consulting gigs, I had a requirement to build out a small virtualized datacenter cabinet as cheaply as possible (Less than $40,000 for necessary software and hardware in less than 13u) that was capable of high-availability web, database, and infrastructure hosting for an organization with about 25 employees worldwide and tens of [...]]]></description>
			<content:encoded><![CDATA[<p>For one of my consulting gigs, I had a requirement to build out a small virtualized datacenter cabinet as cheaply as possible (Less than $40,000 for necessary software and hardware in less than 13u) that was capable of high-availability web, database, and infrastructure hosting for an organization with about 25 employees worldwide and tens of thousands of daily visitors to sites for several different large customers. Per the requirements, all of the mission-critical equipment at the infrastructure layer MUST be supported by the vendor &#8230; that means the hypervisor, hardware, network equipment, and any non-replicated operating systems MUST have support. A notable exception is made for places like webheads where per our design we&#8217;ll spawn instances to handle traffic spikes. </p>
<p>The web servers are *mostly* running <a href="http://drupal.org/">Drupal</a> installations, backed by MySQL. Drupal is a fairly hefty CMS and for a long time we&#8217;ve used several physical servers fronted with <a href="http://www.apsis.ch/pound/">Pound</a>. This is very obviously expensive &#8212; two servers alone from Rackspace cost $1500/mo. While we could do it cheaper in the cloud, we weren&#8217;t comfortable with the faceless/nameless cloud&#8217;s ability to meet the uptime requirements in our SLA, nor were we comfortable with the ability of the cloud vendors to provide the support levels that were required by our customers. </p>
<p>Building this out &#8212; </p>
<p>First, I decided to use VMWare ESXi and vCenter to fulfill the high availability requirement. Since this was going to be a remote installation, I needed to be able to quickly and effectively diagnose problems and needed to have support due to the nature of the clients we were hosting. While I could&#8217;ve gone with Xen and SLES&#8217;s HA package, the license for a <a href="http://www.vmware.com/products/vsphere/small-business/buy.html">Essentials Plus</a> version of VMWare was actually cheaper than the necessary bits from Novell and Citrix&#8230; and unlike those products, VMWare supports failover and migration between hosts out of the box instead of requiring weird Pacemaker/OpenAIS cluster configurations. </p>
<p>Second, after pricing comparable hardware solutions, we ended up going with Dell. We priced HP, Sun/Oracle, and SuperMicro systems. The SuperMicro systems were closest in cost, but ended up failing on support. Compared to Dell&#8217;s Same Day/Next Business Day offerings, SuperMicro&#8217;s &#8220;go through your reseller&#8221; strategy falls way short of business expectations. Also, one neat thing about the <a href="http://www.dell.com/us/en/enterprise/storage/powervault-md3200/pd.aspx?refid=powervault-md3200&#038;cs=555&#038;s=biz">Dell MD3200/3220</a> is that it can handle up to 8 hosts connecting via 6 gb/s SAS at once to the same LUN. Since we don&#8217;t expect to have anywhere near that number of hosts in the short term, we can effectively use VMFS. In the longer term, if we outgrew that number of hosts without upgrading to a fiber SAN, we could share VM partitions out via NFS and offload our highest concurrent storage user, MySQL, to it&#8217;s own storage.  </p>
<p>Third, for the gateway/router/etc, we chose the <a href="http://www.cisco.com/en/US/products/ps6120/prod_models_comparison.html#~mid-range">Cisco 5505 ASA Security +</a> device. Two of them will fit in a 2u rack mount kit (actually 1.5, which is ideal for the crowded top of the colocation cabinet we&#8217;re in) and each has 8 1gb/s ports. They&#8217;re pretty full-featured little boxes (note that you MUST buy a support contract if you want a warranty past 90 days though!)&#8230; sort of like a home cable modem, they&#8217;ll gateway, NAT, provide DHCP, and other basic networking functions out of the box &#8230; unlike your average home gateway, though, they support active/standby failover, a ton of diagnostics, routing, and shaping, and speak the same asa/pix language that we&#8217;re all familiar with. They also do VPN via everyone&#8217;s favorite Cisco VPN client. My other choice, besides the comparable Juniper Networks equipment, was to build my own system using BSD &#038; CARP or WRT-DD&#8230; while both of those might have been cheaper for the client, they didn&#8217;t fulfill the support requirement. </p>
<p>Last, we needed to provide an onsite and offsite backup capability. With 15k RPM disks running up to $1k, the best solution I could find was a <a href="http://www.buffalotech.com/products/network-storage/business/terastation-iii-ts-xlr5-ts-rxlr5/">2TB Buffalo TeraScale NAS.</a> I configured it as a 3-drive RAID5 with 1 hot spare. </p>
<p>When we were done, the total configuration looked something like this:<br />
- 3x Dell R610 w/ dual X5620 + 32gb RAM + no drives (boot from internal USB)<br />
- 1x Dell MD3200 w/ 2 MM, 7x 300gb 15k drives<br />
- 1x 2tb Buffalo TeraScale III NAS<br />
- 2x Cisco ASA 5505 + rack mount kit<br />
- VMWare Essentials Plus</p>
<p>For a few of the infrastructure bits, we purchased a license for Redhat EL 5.5. For most of the webheads and other &#8216;disposable&#8217; VMs, we use CentOS 5.5. VCenter runs in a Win2k8r2 VM. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.karlkatzke.com/building-out-a-small-datacenter/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>tcprstat &#8211; TCP response time</title>
		<link>http://www.karlkatzke.com/tcprstat-tcp-response-time/</link>
		<comments>http://www.karlkatzke.com/tcprstat-tcp-response-time/#comments</comments>
		<pubDate>Wed, 01 Sep 2010 20:47:23 +0000</pubDate>
		<dc:creator>karlkatzke</dc:creator>
				<category><![CDATA[linux]]></category>
		<category><![CDATA[sysadmin]]></category>

		<guid isPermaLink="false">http://www.karlkatzke.com/?p=611</guid>
		<description><![CDATA[Here&#8217;s a handy tool to help you troubleshoot busy servers &#8212; TCPRSTAT, beta software from the guys at Percona, who see a ton of problematic servers under load. Quality stuff.]]></description>
			<content:encoded><![CDATA[<p>Here&#8217;s a handy tool to help you troubleshoot busy servers &#8212; <a href="http://www.mysqlperformanceblog.com/2010/08/31/introducing-tcprstat-a-tcp-response-time-tool/">TCPRSTAT</a>, beta software from the guys at Percona, who see a ton of problematic servers under load. Quality stuff. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.karlkatzke.com/tcprstat-tcp-response-time/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Oracle&#8217;s Really (not) Ruining Things</title>
		<link>http://www.karlkatzke.com/oracles-really-not-ruining-things/</link>
		<comments>http://www.karlkatzke.com/oracles-really-not-ruining-things/#comments</comments>
		<pubDate>Tue, 20 Jul 2010 01:55:28 +0000</pubDate>
		<dc:creator>karlkatzke</dc:creator>
				<category><![CDATA[punditry]]></category>
		<category><![CDATA[sysadmin]]></category>

		<guid isPermaLink="false">http://www.karlkatzke.com/?p=573</guid>
		<description><![CDATA[MySQL and Java are apparently doing horribly under Oracle. &#60;/sarcasm&#62; &#8230; despite all the temper-tantrums the OSol guys are throwing.]]></description>
			<content:encoded><![CDATA[<p><a href="http://techie-buzz.com/foss/mysql-and-java-doing-well-under-oracle.html">MySQL and Java are apparently doing <i>horribly</i> under Oracle.</a> &lt;/sarcasm&gt; &#8230; despite all the temper-tantrums the OSol guys are throwing. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.karlkatzke.com/oracles-really-not-ruining-things/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Backing Up Two Ways from Sunday</title>
		<link>http://www.karlkatzke.com/backing-up-two-ways-from-sunday/</link>
		<comments>http://www.karlkatzke.com/backing-up-two-ways-from-sunday/#comments</comments>
		<pubDate>Mon, 22 Feb 2010 18:25:07 +0000</pubDate>
		<dc:creator>karlkatzke</dc:creator>
				<category><![CDATA[howto]]></category>
		<category><![CDATA[sysadmin]]></category>
		<category><![CDATA[vmware]]></category>

		<guid isPermaLink="false">http://www.karlkatzke.com/?p=555</guid>
		<description><![CDATA[One method of backup or recovery isn&#8217;t enough. Period. No matter what anyone tells you, what the book says, what your boss says, or what you think you need, you need to be backing things up in many ways. Here&#8217;s a few examples. MySQL Theoretically, you could recover anything you needed from the binary log, [...]]]></description>
			<content:encoded><![CDATA[<p>One method of backup or recovery isn&#8217;t enough. Period. No matter what anyone tells you, what the book says, what your boss says, or what you think you need, you need to be backing things up in many ways. </p>
<p>Here&#8217;s a few examples. </p>
<h3>MySQL</h3>
<p>Theoretically, you could recover anything you needed from the binary log, as long as you&#8217;ve got a good starting point and a good ending point.  (This, by the way, is a good reason to flush the binary logs and take a backup on a regular basis.) What if your binary log&#8217;s corrupted, though? You need to fall back to a full SQL backup &#8230; which you&#8217;re doing regularly, right? </p>
<p>If your binary log is corrupted, any mirrors you are using that are based on that binary log are corrupted as well. </p>
<p>Case in point: I had a client with a very active, very large database&#8230; north of 15GB in InnoDB. The binary log hit a bug and corrupted itself. The backups were being done from that mirror so that they didn&#8217;t interrupt the main machine&#8217;s processing, but they only kept a few days worth, so we couldn&#8217;t use those backups to restore. The most recent un-corrupted dump from the main machine had been taken three months before. Luckily, the client had done some application-level backups to an XML format, and we were able to (laboriously) restore from that. It cost about $3,000 because they didn&#8217;t want to degrade their forum&#8217;s performance for a half hour every night and pay for an extra TB or storage or so to keep more than a few days worth of upgrades. </p>
<h3>Servers</h3>
<p>Scenario: Hard drive gets corrupted or dies. You need to get the machine back up quickly. You have a snapshot of the machine &#8230; but your snapshot is on the same storage as that machine unless you back it up somewhere else. </p>
<p>On top of that, storage requirements have been growing rapidly for servers. Where a linux server take less than 1GB, Windows 2008R2 can take up 20GB with system files alone. (In fact, if you plan to have any data on that server, or keep any logs, we&#8217;d recommend going with 40GB minimum for your C: drive.) It&#8217;s important to back that up to something that&#8217;s not on the same system disks. </p>
<p>Better yet, take a hint from the application-level backups &#8212; and back up your registry, configuration files, and data separately from the snapshot. We tend to use RSync for this role and put it in a rolling-backup mode with the &#8211;link-dest option to ease recovery. </p>
<h3>VMWare</h3>
<p>Same principle as above. Snapshots are usually stored in the same datastore. Datastore goes bye-bye, so do your snapshots. </p>
<p>There&#8217;s some great products out there that can really help with this issue. The one we use is <a href="http://www.veeam.com/vmware-esx-backup.html">VEEAM Replication and Backup</a>. It can be used to replicate a snapshot to another VMWare cluster, or back up the datastore files at a consistent snapshot point and then copy them elsewhere all in one step. We use a two-step process &#8212; we keep them locally on the backup server and also transmit them to another datacenter across campus. </p>
<p>When using VEEAM with Windows, make sure that VMWare Tools is installed and that you enable the VSS integration. (You&#8217;ll also need to make sure that the administrative share option on the system drives are enabled, and that the appropriate firewall ports are opened.) This ensures that you&#8217;ve got a transactionally consistent backup snapshot. </p>
<h3>Practice, practice, practice</h3>
<p>The only way to make sure that you can recover from a disaster is to test recovering from a disaster. At least once a year, we practice recovering from a worst-case scenario. That means bringing up a new machine from scratch, re-implementing all of the options and configurations, and then restoring the data. Despite that kind of restoration being something that should never happen, it does &#8212; and practice gives you insights into how to improve the processes and turns a recovery operation from an expensive nightmare that sets back all of your other processes into something that you can execute quickly and professionally. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.karlkatzke.com/backing-up-two-ways-from-sunday/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Detecting and Resolving LAMP Stack Problems &#8211; Scheduled Downtime</title>
		<link>http://www.karlkatzke.com/detecting-and-resolving-lamp-stack-problems-scheduled-downtime/</link>
		<comments>http://www.karlkatzke.com/detecting-and-resolving-lamp-stack-problems-scheduled-downtime/#comments</comments>
		<pubDate>Thu, 15 Oct 2009 04:51:15 +0000</pubDate>
		<dc:creator>karlkatzke</dc:creator>
				<category><![CDATA[linux]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[php]]></category>
		<category><![CDATA[sysadmin]]></category>
		<category><![CDATA[apc]]></category>
		<category><![CDATA[cache]]></category>
		<category><![CDATA[caching]]></category>
		<category><![CDATA[Drupal]]></category>
		<category><![CDATA[innodb]]></category>
		<category><![CDATA[myisam]]></category>
		<category><![CDATA[xcache]]></category>

		<guid isPermaLink="false">http://www.karlkatzke.com/?p=536</guid>
		<description><![CDATA[In the last issue of my current consulting saga, Detecting and Resolving LAMP Stack Performance Problems, we talked about a Drupal site that was being brought offline every few hours due to poor tuning of the LAMP stack. With the default settings, a site isn&#8217;t going to take much before it just falls flat on [...]]]></description>
			<content:encoded><![CDATA[<p>In the last issue of my current consulting saga, <a href="http://www.karlkatzke.com/detecting-and-resolving-lamp-stack-performance-problems/">Detecting and Resolving LAMP Stack Performance Problems</a>, we talked about a Drupal site that was being brought offline every few hours due to poor tuning of the LAMP stack. With the default settings, a site isn&#8217;t going to take much before it just falls flat on it&#8217;s face. </p>
<p>After triaging and addressing the main issues based on the logs, we were left with two more issues. The first was the inability of Drupal to perform well in an environment where it had to rebuild every page from source for every page view. This is well documented in the drupal community; there are many pages inn the documentation area of Drupal that deal with caching and performance optimization. The second issue was MySQL performance and the long table lock/scan times we were seeing on some queries that could not be further optimized. </p>
<p>We scheduled a 2 hour downtime with the customer to install some tools. Our checklist was installing <a href="http://www.danga.com/memcached/">memcached</a> and <a href="http://pecl.php.net/package/APC">PHP-APC</a>. I also wanted to take the time to back up the MySQL database and run a good check_table on each of the MyISAM tables. (Yes, I know. MyISAM. More on that later.) </p>
<p><small>Side note: I would typically prefer <a href="http://xcache.lighttpd.net/">xcache</a>, which in my mind is superior to APC because I have an easier time working with it and prefer it&#8217;s management interface and tuning parameters. However, APC was available as a binary package for the platform we were on, and xcache was not. To make things faster and easier, we chose APC. Despite the endless debate about which is superior, both are usable and work. I have not run into problems using APC on an 8-core system, despite oft-reported-but-never-proven flock() issues.</small> </p>
<p>APC was fast to install and required minimal tuning. It produced a noticeable performance improvement. However, the number of deadlocked apache threads (and total number of apache threads) went up, and the other Apache errors that dealt with clients timing out did not cease. </p>
<p>We installed <a href="http://drupal.org/project/memcache">the Drupal Memcache implementation</a> along with the appropriate PECL module. We configured two pools, both using up to 1 GB of RAM (which we had to spare on the web server.) The &#8216;hot&#8217; pool would mostly handle cached pages for non-logged-in users, and the other one would handle some higher volume caching for users that are logged in, as well as some internal/custom functionality to go along with specialized RSS feed parsing. (Side note: We found that the <a href="http://drupal.org/project/cache">Cache</a> and <a href="http://drupal.org/project/cacherouter">Cacherouter</a> plugins did not work as expected. Rather than waste downtime troubleshooting them, we used what worked.)</p>
<p>Again, we saw a huge performance boost. We needed to do some tuning (changing certain cache settings and analyzing performance, but that was essentially everything that we could find to do from a single-server web server side of things. </p>
<p>While we&#8217;re on the topic of drupal: Don&#8217;t forget that Drupal has a &#8216;cron&#8217; program that should be getting called remotely. It&#8217;s sort of a <a href="http://drupal.org/cron">poor man&#8217;s cron solution</a>, but it works. It was causing our load to spike every 20 minutes. We occasionally disabled it during testing to be sure we understood it&#8217;s effects. </p>
<p>The next beast to tackle was the database. As previously mentioned, it was on MyISAM tables. Obviously, this isn&#8217;t ideal. We found that node lookups, statistics lookups, and searches were taking up a disproportionate amount of server time because they were both  The weirdest part was that we were seeing some full table scans in the slow query log (i.e. 3 million rows scanned) but a later &#8216;explain&#8217; statement couldn&#8217;t replicate the performance recorded in the slow query log. </p>
<p>We batted around adding indexes. The issue was that Drupal&#8217;s search and nodes tables are frequently altered, which means the indexes become scrambled quickly. And really, what was taking time was the size of the table we were dealing with &#8212; the table wouldn&#8217;t fit in memory, so it was copying it to a disk temporary table and then doing a filesort. </p>
<p>Running check_table did the trick to re-sort the indexes and &#8216;defrag&#8217; the files, but the benefits only lasted so long. </p>
<p>What we ended up doing was taking the database down, dumping everything out to a SQL file, and re-importing everything to InnoDB. Make sure that innodb_files_per_table is enabled, or you might end up with some unexpectedly big files &#8212; this depends on your architecture and filesystem. Remember that InnoDB files can not currently shrink. (Also: You <b>can</b> do the table changes online, but it&#8217;s really not recommended. It takes a long time, especially when some of your tables are larger than 1gb.) Don&#8217;t forget to switch to <a href="http://www.mysqlperformanceblog.com/2007/11/03/choosing-innodb_buffer_pool_size/">set innodb_buffer_pool_size</a> appropriately.</p>
<p>The change to InnoDB, the implementation of both PHP engine-level opcode and actual built pages, and the careful tuning of Apache and MySQL parameters led to stability for this client. </p>
<p>There were some further problems, but they were with an unrelated product that causes a nightly load spike on the database machine. Tomorrow night I&#8217;ll covering the cleanup work: NFS iops vs. local disk, binary logging and the lack of backups in the original configuration, and building some redundancy into the system so that it can tolerate faults more smoothly.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.karlkatzke.com/detecting-and-resolving-lamp-stack-problems-scheduled-downtime/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Detecting and Resolving LAMP Stack Performance Problems</title>
		<link>http://www.karlkatzke.com/detecting-and-resolving-lamp-stack-performance-problems/</link>
		<comments>http://www.karlkatzke.com/detecting-and-resolving-lamp-stack-performance-problems/#comments</comments>
		<pubDate>Wed, 14 Oct 2009 03:45:34 +0000</pubDate>
		<dc:creator>karlkatzke</dc:creator>
				<category><![CDATA[linux]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[php]]></category>
		<category><![CDATA[sysadmin]]></category>
		<category><![CDATA[apache]]></category>
		<category><![CDATA[deadlock]]></category>
		<category><![CDATA[lamp]]></category>
		<category><![CDATA[logging]]></category>
		<category><![CDATA[problems]]></category>
		<category><![CDATA[syslog]]></category>
		<category><![CDATA[troubleshooting]]></category>

		<guid isPermaLink="false">http://www.karlkatzke.com/?p=534</guid>
		<description><![CDATA[As a sysadmin, we sometimes run into performance problems with multiple angles and portions. It&#8217;s sometimes not particularly obvious where the actual performance problem is, and resolving one problem that you can see might bring another couple of problems to the surface. The below comes from a consulting gig that I&#8217;ve been working on recently. [...]]]></description>
			<content:encoded><![CDATA[<p>As a sysadmin, we sometimes run into performance problems with multiple angles and portions. It&#8217;s sometimes not particularly obvious where the actual performance problem is, and resolving one problem that you <i>can</i> see might bring another couple of problems to the surface. </p>
<p>The below comes from a consulting gig that I&#8217;ve been working on recently. The parties will remain nameless. I&#8217;m going to break this into several parts, since it took over three weeks to resolve all of the immediate <i>problems</i> with the site, and we&#8217;re still not all the way done with the task list. </p>
<p>Going in, I knew that we were dealing with a heavily loaded Drupal site that shared a mysql database with a wiki and a forum. The site would go down at random times &#8212; sometimes multiple times per hour. Upon logging into the server the first time, it seemed slow &#8212; so I immediately called &#8216;uptime&#8217; and the answer came back with all three time period load averages over 90 on an 8-core server. There were 125 Apache processes running, but most of them were in Deadlocked state. The very second command I ran on the server was <code>killall -9 httpd</code>, which is never the way you want to start out a consulting gig&#8230; </p>
<p>While that was busy killing off processes, I checked the Apache configuration. Sure enough, it was still at the stock settings. I immediately cranked up the requests per process to 20,000 and upped the server limit to 300. (Remember, we&#8217;re dealing with prefork here.) I restarted Apache and watched it churn. It handled the load far more gracefully with some room to move around, and I quickly saw the number of Apache processes spike, and then sink down to about 80 and stay there. </p>
<p>The next step was looking through the logs. A quick aside about logs: I like my logs to be clean. I don&#8217;t like debug messages, I don&#8217;t like status messages, and I don&#8217;t want to see either of them. If I have a lot of a certain type of status message that I *do* want to trap, I make sure that syslog puts it into it&#8217;s own file or I handle the problem that&#8217;s causing it. In this case, <code>/var/log/messages</code> had a bunch of SNMP messages logging each get, and some messages about martian packets. The martian packets issue could be (and was) resolved with a quick firewall tweak to reject packets from an illegal source. The snmp issue was resolved by editing snmpd&#8217;s startup configuration to log to local1 instead of the default (check your man file for snmpd to make sure you get the right flags, it&#8217;s changed&#8230;), and then editing syslog&#8217;s configuration to log everything on local1 to /var/log/snmpd &#8212; and don&#8217;t forget to add it to logrotate! </p>
<p>Now we were down to two classes of errors. The first was obvious and sort of easy to troubleshoot: &#8220;MySQL server has gone away.&#8221; Log into the MySQL server. See if there&#8217;s slow-running queries. Nope? Well, double check the timeout that&#8217;s set in <code>/etc/my.cnf</code> &#8212; on this server, slow-query-time was set to twenty seconds, but timeout was set to ten seconds. Well, that&#8217;s not very useful. Also, check your caches and table types. In this case, everything was MyISAM. More on that later &#8212; for now, just make sure we&#8217;re using the right kind of caching strategy for your table type and system specs, which in this case is MyISAM key cache (and lots of it!). Try to fit all of your most-used tables in memory. </p>
<p>On this gig, we got the site back on it&#8217;s feet with these things. Downtime went from multiple events an hour down to one or two events per six hour period. Unfortunately, we were also out of easy things to change. Next time I post, we&#8217;ll start to get into fixes that will <i>cause</i> downtime. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.karlkatzke.com/detecting-and-resolving-lamp-stack-performance-problems/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Sun/Oracle OpenWorld &amp; Flash Storage</title>
		<link>http://www.karlkatzke.com/sunoracle-openworld-flash-storage/</link>
		<comments>http://www.karlkatzke.com/sunoracle-openworld-flash-storage/#comments</comments>
		<pubDate>Wed, 14 Oct 2009 03:09:14 +0000</pubDate>
		<dc:creator>karlkatzke</dc:creator>
				<category><![CDATA[sysadmin]]></category>
		<category><![CDATA[flash]]></category>
		<category><![CDATA[Sun]]></category>
		<category><![CDATA[zfs]]></category>

		<guid isPermaLink="false">http://www.karlkatzke.com/?p=532</guid>
		<description><![CDATA[At the Sun OpenWorld conferene keynote today, there were a few new products listed in the Flash storage arena &#8212; most notably the F5100 that everyone&#8217;s jibber-jabbering about. As a smaller customer, I&#8217;m far more interested in the SunFlash F20 PCIe card &#8212; which I don&#8217;t see many people blogging about. Looks like I could [...]]]></description>
			<content:encoded><![CDATA[<p>At the <a href="http://www.cuddletech.com/blog/pivot/entry.php?id=1079">Sun OpenWorld conferene keynote today</a>, there were a few new products listed in the Flash storage arena &#8212; most notably the <a href="http://www.c0t0d0s0.org/archives/6003-Sun-Storage-F5100-officially-announced.html">F5100 that everyone&#8217;s jibber-jabbering about</a>. </p>
<p>As a smaller customer, I&#8217;m far more interested in the <a href="http://www.sun.com/storage/disk_systems/sss/f20/">SunFlash F20 PCIe card</a> &#8212; which I don&#8217;t see many people blogging about. Looks like I could add that to not only my existing systems, but <b>non-Sun</b> systems that can make use of that sort of storage. <i>That</i>, ladies and germs, is something worth the name &#8220;OpenWorld&#8221; &#8212; as in, a world of open wallets.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.karlkatzke.com/sunoracle-openworld-flash-storage/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Reading/Googling List</title>
		<link>http://www.karlkatzke.com/readinggoogling-list/</link>
		<comments>http://www.karlkatzke.com/readinggoogling-list/#comments</comments>
		<pubDate>Tue, 22 Sep 2009 04:42:54 +0000</pubDate>
		<dc:creator>karlkatzke</dc:creator>
				<category><![CDATA[reading list]]></category>
		<category><![CDATA[sysadmin]]></category>
		<category><![CDATA[economics]]></category>
		<category><![CDATA[iPhone]]></category>
		<category><![CDATA[iSCSI]]></category>
		<category><![CDATA[recession]]></category>
		<category><![CDATA[san]]></category>
		<category><![CDATA[vmware]]></category>
		<category><![CDATA[vSphere]]></category>
		<category><![CDATA[webcam]]></category>

		<guid isPermaLink="false">http://www.karlkatzke.com/?p=523</guid>
		<description><![CDATA[vSphere, SAN or iSCSI-related: Using iSCSI with vSphere &#8211; Pretty much the bible, they covered it all. 2TB drives are here, but Stephen Foskett identifies the issues with bringing them to the enterprise. In the same vein, he covers the death of RAID as a storage technology, and what lies beyond. I need to research [...]]]></description>
			<content:encoded><![CDATA[<ul>
<li>vSphere, SAN or iSCSI-related:
<ul>
<li><a href="http://virtualgeek.typepad.com/virtual_geek/2009/09/a-multivendor-post-on-using-iscsi-with-vmware-vsphere.html">Using iSCSI with vSphere</a> &#8211; Pretty much the bible, they covered it all.</li>
<li><a href="http://blog.fosketts.net/2009/08/14/2-tb-enterprise-drives/">2TB drives are here, but Stephen Foskett identifies the issues with bringing them to the enterprise.</a> In the same vein, he covers the <a href="http://blog.fosketts.net/2008/09/14/turning-page-raid/">death of RAID as a storage technology, and what lies beyond</a>.</li>
<li>I need to research if our <a href="http://www.vmwareinfo.com/2009/01/iscsi-hardware-or-software-how-many.html">iSCSI TOE cards are supported</a> by vSphere&#8230; </li>
<li><a href="http://blog.laspina.ca/">Ubiquitous Talk</a> might be my new favorite high-quality techie blog.</li>
</ul>
</li>
<li>Other:
<ul>
<li><a href="http://www.hightechdad.com/2009/08/12/webcam-monitoring-streaming-via-a-5-iphone-application-icam/">Streaming live webcams to your iPhone</a></li>
<li>This has been linked all over, but <a href="http://www.dailymail.co.uk/home/moslive/article-1212013/Revealed-The-ghost-fleet-recession-anchored-just-east-Singapore.html">the Ghost Fleet of the Recession</a> is anchored just off of Singapore, and it doesn&#8217;t look like it&#8217;s going anywhere soon. Sis wondered why she hadn&#8217;t seen the Florida in port recently; she ships a lot of containers with Maersk.</li>
</ul>
</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.karlkatzke.com/readinggoogling-list/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

