Reputation: 405
I was reading a book in which there is this paragraph :
Arrays in C can be seen as a contiguous chunk of memory. More precisely, the last dimension of the array is the contiguous part. We call this the row-major order. Understanding this and the fact that a cache fault loads a complete cache line into the cache when accessing uncached data to prevent subsequent cache faults, we can see why accessing an array of dimension 10000x10000 with array[0][0] would potentially load array[0][1] in cache, but accessing array[1][0] right after would generate a second cache fault, since it is sizeof(type)*10000 bytes away from array[0][0], and therefore certainly not on the same cache line. Which is why iterating like this is inefficient:
#define ARRLEN 10000
int array[ARRLEN][ARRLEN];
size_t i, j;
for (i = 0; i < ARRLEN; ++i)
{
for(j = 0; j < ARRLEN; ++j)
{
array[j][i] = 0;
}
}
Can you explain this to me that what they are trying to explain in this paragraph and what is the "cache fault" thing they are talking about?
Upvotes: 2
Views: 436
Reputation: 212208
Think of the array as pages in a book. If each page holds 1024 characters, then the array declared as a[100][1024]
is like a 100 page book. It is more efficient to read the book by reading each page. That is, you iterate in the order a[0][0], a[0][1], ..., a[0][1023], a[1][0]. ie, you read a full page and then turn the page. If you iterate over the left most index it is like reading one character from each page, turning the page after you read a single character, then going back to page 1 when you get to the end of the book to read the 2nd character. Turning the page is a cache fault.
Upvotes: 13