<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>agigatech.com &#187; Write Cache</title>
	<atom:link href="http://agigatech.com/blog/tag/write-cache/feed/" rel="self" type="application/rss+xml" />
	<link>http://agigatech.com/blog</link>
	<description>AgigA Tech Inc Company Blog</description>
	<lastBuildDate>Wed, 30 Dec 2009 15:10:35 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Bulletproof Memory for RAID Servers, Part 2</title>
		<link>http://agigatech.com/blog/bulletproof-memory-for-raid-servers-part-2/</link>
		<comments>http://agigatech.com/blog/bulletproof-memory-for-raid-servers-part-2/#comments</comments>
		<pubDate>Fri, 13 Nov 2009 22:45:43 +0000</pubDate>
		<dc:creator>AgigA Moderator</dc:creator>
				<category><![CDATA[backup]]></category>
		<category><![CDATA[memory]]></category>
		<category><![CDATA[RAID]]></category>
		<category><![CDATA[Server]]></category>
		<category><![CDATA[Write Cache]]></category>

		<guid isPermaLink="false">http://agigatech.com/blog/?p=98</guid>
		<description><![CDATA[Just what is the real cost of the memory in a RAID server? Seems like a simple question, right? No matter what technology a RAID server design team adds non-volatile memory, there will be costs beyond the acquisition cost of the memory and those extra costs should be factored into the system design if the design is to be competitive.]]></description>
			<content:encoded><![CDATA[<p>Just what is the real cost of the memory in a RAID server? Seems like a simple question, right? For volatile memories such as DRAM and SRAM, the cost is pretty much the purchase cost of the memory DIMMs. Sure, DRAM and SRAM modules might occasionally fail and require replacement, but the associated failure rate is pretty low so the reliability tax on the failures is also relatively low. Not true for non-volatile memory. No matter what technology a RAID server design team adds non-volatile memory, there will be costs beyond the acquisition cost of the memory and those extra costs should be factored into the system design if the design is to be competitive.</p>
<p>As we discussed in <a href="../bulletproof-memory-for-raid-servers-part-1/" target="_blank">Part 1 of this blog entry series</a>, RAID servers must use non-volatile memory for their write caches to prevent data loss during power failures. There are many ways to achieve nonvolatility. One way is to back up the entire server with an uninterruptible power supply. That takes a lot of battery power or a diesel-driven generator (or a hydroelectric turbine, if there’s one handy). Another way is to use a much smaller battery to back up the RAM used as a write cache. Yet another is to use NAND Flash as a write cache. All of these design approaches have problems and no matter the approach, the server processor must be involved in safely preparing for the imminent loss of power. Let’s examine these last two design approaches more closely, assuming that diesel generators and water power are out of the question.</p>
<p>Backup batteries have short lives and require regular maintenance, which they often do not get. NAND Flash memory has relatively slow write times, so it makes a poor write cache when used directly. Worse, NAND Flash memory exhibits write-induced wearout failure. You really must minimize the number of times you write to Flash memory. For both of these reasons, using Flash memory like it’s RAM is clearly a misapplication of Flash memory technology.</p>
<p><strong>So what’s the real cost?</strong></p>
<p>Back to the original question posed in this blog entry: What’s the real cost of the memory in a RAID server? Let’s run a thought experiment and see where it takes us. Consider a battery-backed RAM. Besides the cost of the RAM, which is the same whether there’s battery backup or not, there’s the cost of the battery. What’s the cost of a battery pack? It’s on the order of $100 for the RAID server customer. However, if your customers are replacing these batteries annually as they should, then there’s roughly $500 worth of batteries to buy per server over the course of a four-year lifespan for the memory. (That’s $100 initially for the first battery and $100 per year for each year following.)</p>
<p>However, that’s not the only cost. Someone must go into the server room, take the server down, replace the battery, and then bring the server back up. For the sake of argument, let’s say it takes an hour for an IT tech to do all of this for one server. What’s the burdened cost of an hour of an IT tech’s time? Well, that number varies, but again it’s on the order of $100. And you need to do it four times over the course of the 4-year life of the server memory. That’s another $400. (We’re ignoring recycling costs here, but batteries should be recycled properly.)</p>
<p>So if battery maintenance occurs as it should, the cost of non-volatile server memory is roughly the cost of the memory plus $900 in maintenance costs. These costs greatly exceed the cost of the memory itself.</p>
<p>But what if battery maintenance doesn’t take place as it should? What if the battery fails in service? What’s the cost then? Well, in this scenario, you need to make some big assumptions. First, you need to assume that the batteries are all properly monitored so that there’s an alert as soon as a battery fails. If not, then the RAID servers are always subject to catastrophic data loss because their write caches are unprotected from power failures. Actually, it’s not so easy to sense battery failure without putting a load on the battery, but let’s ignore this detail for now.</p>
<p>Next you need to assume that there’s a replacement battery handy, sitting ready to go on the shelf next to the server room, and that someone knows where this battery is stored. Otherwise the RAID server with the failed battery will need to be taken out of service and replaced with another server until a new battery can be found, flown in, or otherwise delivered from the warehouse, wherever that is. Battery spares are cheaper to keep on the shelf than spare RAID servers so it’s likely that it’ll be a spare battery on the shelf. Likely as not, the battery on the shelf won’t be fully charged, but let’s ignore that detail for now as well.</p>
<p>Finally, you need to assume that there’s always an IT tech on hand who knows how to replace a server backup battery and can act quickly when a battery fails.</p>
<p>These are all big assumptions and they are all most assuredly <span style="text-decoration: underline;">bad</span> assumptions, but they set a lower bound on the associated maintenance costs. An unattainable lower bound, most certainly, but a lower bound nevertheless.</p>
<p><strong>$300 for one failure, $500 for two</strong></p>
<p>If you make all of these assumptions, then the costs for server-memory nonvolatility using battery backup include the initial $100 battery cost, plus the cost of replacing the failed batteries over the four-year life of the server memory. In the highly unlikely event that there’s only one failure during that time, the 1-time replacement cost is about $200 ($100 for the replacement battery plus $100 for the labor cost to replace it) for a total of $300 for the initial battery plus one replacement. If the battery fails twice during the four years, then the total cost is $500.</p>
<p>While this second scenario sets a lower bound on cost, it’s clearly built on unrealistic assumptions. There will most certainly be unplanned downtime with this scenario.</p>
<p>Batteries almost never fail at convenient times. They seem to have a second sense about these things. Batteries fail at night and when the IT team is otherwise occupied. So you also need to figure in the cost of lost business due to the unplanned server outage. Realistically, that’s clearly going to happen.</p>
<p><strong>Lost time counts too</strong></p>
<p>Now the dollar value of lost data is really tough to set. However, as discussed in the <a href="../bulletproof-memory-for-raid-servers-part-1/" target="_blank">previous blog entry</a>, an hour’s loss of server time could easily cost a large customer thousands or millions of dollars especially if that server customer is Amazon, Google, or a fast-transaction securities trader that relies on response times that are microseconds faster than competing traders. For such customers, the cost of server memory is clearly irrelevant because uninterrupted server uptime is so very valuable to them. These customers know to the penny what server uptime is worth per minute, per second, and even per millisecond. That’s how valuable server uptime is to this class of customer.</p>
<p><em>These customers don’t want to know how much the memory in the server costs. They want to know how the server’s design will prevent unplanned downtime.</em></p>
<p>The server design team must therefore have bulletproof, nonvolatile memory as a goal. This memory should not require annual maintenance so that the server’s design avoids both frequently planned and unplanned downtime due to memory failure. The economics of this goal are simply undeniable.</p>
<p>If you’re thinking that this discussion is leading to a discussion of why AgigA Tech’s approach to non-volatile server memory is worth more money, you’re wrong. After taking maintenance costs into account, AgigA Tech’s AGIGARAM modules actually cost less. Taking the cost of lost data and server downtime into account, AGIGARAM modules cost a lot less. Something to be discussed in the next blog entry.</p>
]]></content:encoded>
			<wfw:commentRss>http://agigatech.com/blog/bulletproof-memory-for-raid-servers-part-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Bulletproof Memory for RAID Servers, Part 1</title>
		<link>http://agigatech.com/blog/bulletproof-memory-for-raid-servers-part-1/</link>
		<comments>http://agigatech.com/blog/bulletproof-memory-for-raid-servers-part-1/#comments</comments>
		<pubDate>Thu, 12 Nov 2009 23:26:20 +0000</pubDate>
		<dc:creator>AgigA Moderator</dc:creator>
				<category><![CDATA[backup]]></category>
		<category><![CDATA[memory]]></category>
		<category><![CDATA[ultra-capacitor]]></category>
		<category><![CDATA[ultracapacitor]]></category>
		<category><![CDATA[RAID]]></category>
		<category><![CDATA[Server]]></category>
		<category><![CDATA[Write Cache]]></category>

		<guid isPermaLink="false">http://agigatech.com/blog/?p=94</guid>
		<description><![CDATA[Envision a data center with row upon row of rack-mounted RAID servers. All of these servers have battery-backup units for their RAM caches but buried somewhere deep inside this maze of racks, there’s a battery years past its prime. Perhaps there are several such batteries. These batteries are supposed to be changed out annually, but [...]]]></description>
			<content:encoded><![CDATA[<p><em>Envision a data center with row upon row of rack-mounted RAID servers. All of these servers have battery-backup units for their RAM caches but buried somewhere deep inside this maze of racks, there’s a battery years past its prime. Perhaps there are several such batteries. These batteries are supposed to be changed out annually, but you know how things go. Sometimes, preventative maintenance just doesn’t happen on time. Or at all.</em></p>
<p><em> </em></p>
<p><em>In fact, one of those batteries has failed. The RAM cache it protects is at risk when the next power outage occurs. When that happens, one or more of the data center’s customers will lose data. Critical data. After all, what data isn’t critical?</em></p>
<p><em> </em></p>
<p><em>Worse, the failed battery is leaking. Acid is oozing out of the battery. It’s quite possible that the acid is leaking onto critical circuitry inside of the RAID enclosure. Drip. Drip. Drip. The acid starts to etch into the circuitry. The disaster is perhaps moments away&#8230;</em></p>
<p>Customers buy one thing from RAID vendors: a safe haven for their bits. The bulletproof aspect of a RAID array’s disk storage resides in the redundancy of the disk drives themselves. A RAID 5 array protects against data loss should one disk drive fail and a RAID 6 array protects against faults should two drives fail. Both types employ disk striping with parity (double parity for RAID 6). Because data has value—and some data has tremendous value—the use of RAID systems based on hardware RAID controllers is skyrocketing. However, power loss can negate the efficacy of a RAID system and puts the data at risk.</p>
<p>One critical point of failure in RAID systems with respect to power outages is the write cache. RAID systems employ write caches to speed disk transactions—to boost the IOPS (I/O operations per second) rating. Once a computer system squirts a chunk of data into a RAM-based cache, the RAID system can immediately acknowledge the transaction before actually writing the data to disk. So there’s a critical period of time when the data is at risk from a power failure, after the acknowledgement but before the data is on the disk. If power is lost while the data is in RAM cache, then it’s lost forever.</p>
<p>One way to avoid this problem entirely is to disable the RAID system’s RAM cache. This approach preserves the data but with a huge performance hit. No RAM cache, no performance.</p>
<p>Another way to avoid the problem is to protect the data in a write cache from power failures using a battery-backup unit (BBU). That way, the RAID controller can recognize an impending power failure, can halt transactions, and the BBU will maintain any data yet to be written to disk and thus ride through the power failure.</p>
<p>Sounds great in theory, but in practice there are many problems with BBUs:</p>
<ul>
<li>Batteries      have short, finite lifetimes compared to other electronic components and heat      further shortens their electrochemical lives. There’s heat aplenty inside      most server enclosures. Consequently, battery health should be closely      monitored but it’s often not monitored at all. In fact, some data-center      operations teams are surprised to discover that there’s a high-maintenance      battery inside of many RAID systems. Of course, by the time they realize      that there’s a battery to be maintained, it’s often too late because the event      that brought this fact to light was a data failure induced by power loss.</li>
</ul>
<ul>
<li>Batteries      need to be replaced every one to two years. First, that’s not going to      happen if no one knows there’s a battery to be replaced. Second, battery      maintenance often falls pretty low on the priority list of tasks to be      performed and the replacement may be dangerously deferred when it’s done      at all. Third, there’s no standardization in BBUs so the correct battery      pack may not be on hand. Worse, the required BBU may be discontinued, no      longer be available. If you can’t order a new one, then what? Fourth,      battery packs cost money and so does the time it takes to install new      ones.</li>
</ul>
<ul>
<li>When      replacing the BBU, the RAID server must be taken offline, or at least the      RAM cache needs to be taken off line and it must stay off line until the      BBU charges up. RAID performance suffers during the downtime. Consumer-level      products such as PCs and PVRs (personal video recorders) may not benefit      much from faster disk drives. Enterprise      systems do. Enterprise      computing clients know precisely what a second’s worth of delay costs in      their business. Sometimes a microsecond’s delay costs big money. For      example, Google and Amazon know to the penny what each additional second      of response delay costs them in terms of lost customer purchases. High-frequency      securities traders and arbitrage houses employ trading strategies that are      highly dependent on ultra-low latency networks. In fact, they co-locate      their trading servers with the trading floor to minimize communications      latency with the computers at the market exchange. These traders profit only      by feeding information on competing bids and offers to their trading algorithms      microseconds faster than their competitors. Loss of write-cache performance      in a RAID system could literally cost such traders millions of dollars per      microsecond of delay.</li>
</ul>
<ul>
<li>Batteries      are not environmentally friendly so it’s a bad idea to just toss them in      the trash. Batteries should be properly recycled and proper recycling is      expensive, beyond the cost of the replacement BBU. Even when recycled      properly, batteries just aren’t that great for the environment.</li>
</ul>
<p>So what’s the right answer to the need for bulletproof RAID write cache? AgigA Tech believes that the answer can be found in a fusion of NAND Flash and ultra-capacitor technologies. Ultra-capacitors are essentially made of benign carbon and have many superior qualities compared to batteries. In particular, they charge faster (less downtime) and they have longer lives (when properly applied). NAND Flash can save a RAM cache’s contents indefinitely and without power. So AgigA Tech’s AGIGARAM modules can be used as RAID RAM-cache modules, providing all of the benefits of battery-backed write caches but without the many liabilities batteries incur.</p>
<p>What about the cost of such an approach? Stay tuned. We’ll address that in the next blog entry.</p>
]]></content:encoded>
			<wfw:commentRss>http://agigatech.com/blog/bulletproof-memory-for-raid-servers-part-1/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

