Reputation: 6192
Can someone post any simple explanation of cache aware algorithms? There are lot of links available but the reading material in those sites is academic in nature and time consuming to read and comprehend.
Upvotes: 13
Views: 8675
Reputation: 61077
Let's take a simple dynamic array list as an example.
A cache oblivious implementation might start out with an empty array then grow this array like this 1, 2, 4, 8, etc. every time we need to grow the array we double the capacity.
A cache line aware solution could decide an initial capacity based on prevalent cache line sizes, i.e. 64 bytes.
If we are storing 32 bit integers in our list we might start with 64/4 = 16 as an initial capacity. The truth of the matter is that CPU will gladly copy between 1 and 64 bytes as if it was the same number of bytes because of how this works. Then maybe we grow the list in increments of 64 bytes (to minimize waste) because that is reasonable in this hypothetical case.
Cache aware algorithms aren't general purpose solutions instead, they ask that you tailor make the algorithm to fit the data. Anytime you take data layout into account you are making a cache aware algorithm.
What you need to know is just what will result in a cache miss and what won't. If you data is accessed in a linear predictable manner the CPU will start prefetching successfully and bring it into cache before you need it avoiding CPU stalls.
Linked lists can be horrible because they (depending on the situation) involve a lot of indirection. However, you can allocate linked list nodes in such a way that they lead to predictable memory access patterns making them more cache aware and better.
That's all there's to it. Proper utilization can easily net you a 10x perf gain. Improper utilization can easily cost you a 100x perf loss.
Upvotes: 0
Reputation: 3841
I think one of the simplest examples of a cache-aware algorithm is accessing a two-dimensional array row-major vs. column-major. As a two-dimensional array is usually stored in memory just as a concatenation of all the rows of the array, accessing it row by row puts the appropriate data into cache at the right time. However, when accessing the array in column-major order, a whole lot of jumps in memory and cache misses can cause a big slowdown.
To give an example, this C++ code:
for (int i = 0; i < MAX_N; ++i) {
for (int j = 0; j < MAX_N; ++j) {
a[i][j] = 10;
}
}
runs 3-4 times faster on my machine than if I swap the indices of the accessed cell (that is, access a[j][i]
instead).
Upvotes: 9
Reputation: 134125
A cache-aware algorithm is designed to minimize the movement of memory pages in and out of the processor's on-chip memory cache. The idea is to avoid what's called "cache misses," which cause the processor to stall while it loads data from RAM into the processor cache.
A cache-aware algorithm that is less than optimum on paper can outperform a traditional algorithm that is in theory "faster," because the cache-aware algorithm uses memory more efficiently.
A cache-aware algorithm is explicitly coded to take advantage of the processor's cache behavior. Intimate details about the processor's memory page size and "cache lines" are coded into the algorithm. As such, a cache-aware algorithm will be highly processor specific.
A cache-oblivious algorithm is coded to use memory in a more cache-friendly manner than a traditional algorithm, but it does not depend on intimate details about the underlying hardware.
Upvotes: 15