I am trying to understand why a program using arrays (such as matrix multiplication) can be written in some way to take advantage of spatial locality of the cache. Does spatial locality provided by a cache refer to locality within virtual memory, physical memory or both? When a computer system brings a block of data from main memory to CPU cache, does it bring virtually or physically contiguous data objects into CPU cache? When we define an array or an object of a structure either non-dynamically or dynamically (via malloc()), is it correct that such an array or object is allocated contiguously? Does "contiguous" refer to virtual memory or physical memory or both? If spacial locality of cache is for physical memory and not necessarily virtual memory, and OS can allocate to a C program virtually not necessarily physically contiguous arrays, how can we write a program to take advantage of the spatial locality of a cache? Thanks.

Reputation:

Does spatial locality provided by a cache refer to virtual memory, physical memory or both?

I am trying to understand why a program using arrays (such as matrix multiplication) can be written in some way to take advantage of spatial locality of the cache.

Does spatial locality provided by a cache refer to locality within virtual memory, physical memory or both? When a computer system brings a block of data from main memory to CPU cache, does it bring virtually or physically contiguous data objects into CPU cache?
When we define an array or an object of a structure either non-dynamically or dynamically (via malloc()), is it correct that such an array or object is allocated contiguously? Does "contiguous" refer to virtual memory or physical memory or both?

If spacial locality of cache is for physical memory and not necessarily virtual memory, and OS can allocate to a C program virtually not necessarily physically contiguous arrays, how can we write a program to take advantage of the spatial locality of a cache?

Thanks.

Upvotes: 7

Answers (2)

user3344003

Reputation: 21647

Assume that you are using a programming language that only supports one dimensional arrays.Let's say that you have a 3x3 matrix. You implement two dimensional arrays by

a [i, j] = a (i*3 + j)

If you structure your array access. If you iterate through the elements of the array, if your outer loop index is i and your inner loop index is j you access in order:

a(0), a(1), a(2), ..... a(8)

If you make j the outer loop index and i the inner loop index you access in order:

a(0), a(3), a(6), a(1), a(4), a(7), a(2), a(5), a(8)

Your are jumping around in your array. This jumping causes havoc with caches because caches expect to grab memory in groups.

This problem still exists in programming languages with multiple dimension array. In that case the compiler translates the multiple dimensions into a single dimension for you. The problem that you have is different programming languages treat the ordering of the subscripts differently.

Upvotes: 0

mevets

Reputation: 10445

1) Really both, but why is subtle.

2) Caches operate on blocks of data called lines, and the bytes within a line are both virtually and physically contiguous. Typical line sizes are 16,32,64 bytes. Two adjacent cache lines must be physically contiguous if they lie within the same page. Typical page sizes are 4,8,16 K. So a machine with a 32 byte cache line and 4K base page has 128 lines per page.

3,4) In C members of a structure, union or array are virtually contiguous. It is up to the operating system whether it will be physically contiguous.

(1) Part 2: There is another cache called the Translation Lookaside Buffer (TLB) which retains recently used page mappings. Without such a mechanism, every memory reference would require two physical memory references: one to load the memory address translation, which it would then apply to generate the desired memory reference.

Suppose your TLB had 32 entries ( stupidly small these days ), and you had code which walked an array like this:

char *p;
for (p = array; p < array + 4096; p++) {
     char *q;
     for (q = p; q < p + 32 * 4096; q += 4096) {
           *q += 1;
     }
}

You would, effectively mimic a machine with no TLB, since each memory reference of ‘*q’ would miss in the TLB and need to be fetched from memory.

You can construct a similarly pathological case for the memory cache if you know the details of the cache associativity and size; or if you are unlucky you can accidentally hit it and wonder why your program is so slow.

Upvotes: 2

Does spatial locality provided by a cache refer to virtual memory, physical memory or both?

Answers (2)

Related Questions