Memory performance.

So a friend of mine pointed me to a really good article about the performance characteristics of RAM.  The article itself is really long and goes into a whole lot of detail about how RAM works, complete with circuit schematics and all.  While this article covers a ton of useful topics, there are really only a few that apply to most people on a day-to-day basis.  That said, I figured I’d write a quick a summary so that people can avoid reading this massive article.  Please note that my re-hash isn’t meant to be a replacement for this article, but rather it’s meant to be a summary (along with a quick code sample for the adventurous =)).

So probably the most important thing that this article covers is the cache.  The cache is basically a small amount of very high performance memory (called SRAM) that contains a copy of data that resides in RAM.  The point of the cache is to keep the processor from having to go out and actually get data from RAM all of the time.  Because the cache is usually much faster than RAM, organizing data for cache efficiency can be a *huge* win.  One common optimization for cache usage is called strip-mining, which is the process of breaking up one loop, and replacing it with many smaller loops.  If you then re-organize the data needed by each loop so that it is tightly packed, you can achieve even more cache hits!  The idea is to both limit and localize the number of memory accesses per loop iteration, which causes more cache hits, which causes *much* better performance.  To give you an idea, this optimization can sometimes speed up code by a full order of magnitude!

The second most important thing that this article covers is the performance characteristics of RAM itself.  While RAM stands for “random access memory”, it turns out that non-sequential access is actually slower than sequential access.  One way to deal with this is to actually make a copy of data that you’re going to be reading from.  If the data is small enough to fit in the cache, then you’ll effectively spend a bunch of time reading data from RAM into the cache, and then be working on the data in the cache.  Because the processor’s caching mechanism is functionally transparent to the application, the best way to do this is to simply allocate a buffer on the stack, copy the data, then read from that buffer.  It’s very counter-intuitive given that you end up doing extra work to gain speed, but in some cases, this can really help out your performance.  In the end, you’re simply performing CPU operations to prevent extra RAM operations in the future, and because the CPU is much faster than RAM, it ends up being a (sometimes big) win =).

In order to illustrate RAM timings, I’ve put together a quick demo that shows random access vs. sequential access.  Because I need to bypass the cache in order to efficiently time RAM access speed, I end up having to use some inline assembly language.  The instruction that I’m using (I use movntdqa to perform cache-bypassing reads for all of you assembly people out there =) ) is only supported on Core 2 chips and better.  Please note that this code is designed for and has only been tested on an Intel i7 processor.

Disclaimer:  Processors are *very* complex these days, and they contain a lot of hardware dedicated to optimizing certain operations.  Because this hardware is entirely automatic from the programmer’s point of view, it is often difficult or impossible to bypass all of the hardware optimizations that might be going on.  That said, this test may not be completely accurate, or it may not provide exact numbers.  Also, interference from the operating system and other applications can greatly affect the performance of this code, and results can sometimes vary by a noticeable margin.  However, in all of my runs, sequential RAM access is *always* faster than non-sequential RAM access.  If anyone notices any bugs, please feel free to point them out and I’ll make the necessary fixes ASAP!

Advertisement

~ by ebray99 on June 20, 2010.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

 
Follow

Get every new post delivered to your Inbox.