Temporal locality in memory mountain

Question

In the csapp textbook, the description of memory mountain denotes that increasing working size worsens temporal locality, but I feel like both size and stride factors contribute to spatial locality only, as throughput decreases when more data is sparsely stored in lower level caches.

Where is temporal locality in play here? As far as I know, it means the same specific memory address is referenced again in the near future as seen in this answer: What is locality of reference?

Hadi Brais · Accepted Answer

This graph is produced by sequentially traversing fixed-size elements of an array. The stride parameter specifies the number of elements to be skipped between two sequentially accessed elements. The size parameter specifies the total size of the array (including the elements that may be skipped). The main loop of the test looks like this (you can get the code from here):

for (i = 0; i < size / sizeof(double); i += stride*4) {
    acc0 = acc0 + data[i];     
    acc1 = acc1 + data[i+stride];
    acc2 = acc2 + data[i+stride*2]; 
    acc3 = acc3 + data[i+stride*3];
}

That loop is shown in the book in Figure 6.40. What is not shown or mentioned in the book is that this loop is executed once to warm up the cache hierarchy and then memory throughput is measured for a number of runs. The minimum memory throughput of all the runs (on the warmed up cache) is the one that is plotted.

Both the size and stride parameters together affect temporal locality (but only the stride affects spatial locality). For example, the 32k-s0 configuration has a similar temporal locality as the 64k-s1 configuration because the first access and last access to every line are interleaved by the same number of cache lines. If you hold the size at a particular value and go along the stride axis, some lines that are repeatedly accessed at a lower stride would not be accessed at higher strides, making their temporal locality essentially zero. It's possible to define temporal locality formally, but I'll not do that to answer the question. On the other hand, if you hold the stride at a particular value and go along the size axis, temporal locality for each accessed line becomes smaller with higher sizes. However, performance deteriorates not because of the uniformly lower temporal locality of each accessed line, but because of the larger working set size.

I think the size axis better illustrates the impact of the size of the working set (the amount of memory the loop is going to access during its execution) on execution time than temporal locality. To observe the impact of temporal locality on performance, the memory throughput of the first run of this loop should be compared against that of the second run of the same loop (same size and stride). Temporal locality increases by the same amount for each accessed cache line in the second run of the loop and, if the cache hierarchy is optimized for temporal locality, the throughput of the second run should be better than that of the first. In general, the throughput of each of N sequential invocations of the same loop should be plotted to see the full impact of temporal locality, where N >= 2.

By the way, memory mountains on other processors can be found here and here. You can create a 3D mountain plot using this or this script.

Temporal locality in memory mountain

Answers (1)

Related Questions