M.J.Watson
M.J.Watson

Reputation: 480

The number of cache hits and cache misses in terms of L1, L2 and L3 caches

The following code will run on a CPU with the following cache structure:

I am studying for my midterm exam there is a question. Modify this code to minimise the number of penalties. Calculate the number of cache hits and cache misses in terms of L1, L2 and L3 caches.

I try to solve it via Loop Interchange. If I change to code like below, so there will be a sequential accesses instead of striding through memory every 16,364 word.

     unsigned int A[65536];
     int i,j;
     for (j=1 ; j<128; j++)
     for (i=0 ; i<65536-256 ; i++)
     A[i]=A[i]+A[i+j]; 

but I stuck with the cache hits and misses. Can someone explain to me?

Upvotes: 3

Views: 1447

Answers (1)

Isuru H
Isuru H

Reputation: 1221

Assuming unsigned int is 4 bytes, the size of your array is 4 * 65536 = 256KB. Your L3 cache can only hold upto 64KB.

In order to minimize penalties, the first thing you should do is to break the loop into 4 sub groups, so that once you load an entry to L3, you should use it completely before being evicted.

unsigned int A[65536];
int i,j,k;
for (k=0 ; k<65536-256; k+=16384)
    for (j=1 ; j<128; j++)
        for (i=k ; i<MIN(k+16384,65536-256) ; i++) //define a MIN function to return minimum
            A[i]=A[i]+A[i+j]; 

Now to calculate cache hits and misses, a cacheline can hold 4 entries of the array. When you access A[0] for the first time, it will be a miss in L1, L2 and L3. When fetched from memory, you don't just fetch A[0], you will also fetch A[1], A[2] and A[3] as a cacheline can hold all 4 of them.

In the same instruction, you also access A[i+j], in this case will be A[1] which will be a hit. So it goes like,

First iteration
    A[i]   - A[0] - Miss
    A[i+j] - A[1] - Hit
Second Iteration
    A[i]   - A[1] - Hit
    A[i+j] - A[2] - Hit
Third Iteration
    A[i]   - A[2] - Hit
    A[i+j] - A[3] - Hit
Forth Iteration
    A[i]   - A[3] - Hit
    A[i+j] - A[4] - Miss // This will cause to fetch A[4], A[5], A[6], A[7]

An the pattern continues until L1 is filled.

Upvotes: 3

Related Questions