Reputation: 480
The following code will run on a CPU with the following cache structure:
L1 cache: 1KB
L2 cache: 8KB
L3 Cache: 64KB
Block size: 16B
unsigned int A[65536];
int i,j;
for (i=0 ; i<65536-256 ; i++)
for (j=1 ; j<128; j++)
A[i]=A[i]+A[i+j];
I am studying for my midterm exam there is a question. Modify this code to minimise the number of penalties. Calculate the number of cache hits and cache misses in terms of L1, L2 and L3 caches.
I try to solve it via Loop Interchange. If I change to code like below, so there will be a sequential accesses instead of striding through memory every 16,364 word.
unsigned int A[65536];
int i,j;
for (j=1 ; j<128; j++)
for (i=0 ; i<65536-256 ; i++)
A[i]=A[i]+A[i+j];
but I stuck with the cache hits and misses. Can someone explain to me?
Upvotes: 3
Views: 1447
Reputation: 1221
Assuming unsigned int
is 4 bytes, the size of your array is 4 * 65536 = 256KB. Your L3 cache can only hold upto 64KB.
In order to minimize penalties, the first thing you should do is to break the loop into 4 sub groups, so that once you load an entry to L3, you should use it completely before being evicted.
unsigned int A[65536];
int i,j,k;
for (k=0 ; k<65536-256; k+=16384)
for (j=1 ; j<128; j++)
for (i=k ; i<MIN(k+16384,65536-256) ; i++) //define a MIN function to return minimum
A[i]=A[i]+A[i+j];
Now to calculate cache hits and misses, a cacheline can hold 4 entries of the array. When you access A[0] for the first time, it will be a miss in L1, L2 and L3. When fetched from memory, you don't just fetch A[0], you will also fetch A[1], A[2] and A[3] as a cacheline can hold all 4 of them.
In the same instruction, you also access A[i+j], in this case will be A[1] which will be a hit. So it goes like,
First iteration
A[i] - A[0] - Miss
A[i+j] - A[1] - Hit
Second Iteration
A[i] - A[1] - Hit
A[i+j] - A[2] - Hit
Third Iteration
A[i] - A[2] - Hit
A[i+j] - A[3] - Hit
Forth Iteration
A[i] - A[3] - Hit
A[i+j] - A[4] - Miss // This will cause to fetch A[4], A[5], A[6], A[7]
An the pattern continues until L1 is filled.
Upvotes: 3