Reputation: 6695
I'm trying to figure out how to calculate the miss rate of an array. I have the answer, but I'm not understanding how the answer was arrived at. I have the following code:
int C[N1][N2];
int A[N1][N3];
int B[N3][N2];
initialize_arrays(A, B, C, N1, N2, N3);
for(i=0; i<N1; ++i)
for(j=0; j<N2; ++j)
for(k=0; k<N3, ++k)
C[i][j] += A[i][k] * B[k][j];
I also have the following info: N1=N2=N3=2048
(what does this mean??). The processor has an L1 data cache of 32kB with line size of 64B (no L2 cache). (what is line size?)
I know the miss rate of array C is (N^2/16)/N^3
. I know the formula is (total misses)/(total accesses)
. I see that there are N^3
total accesses, but how did they get the total misses?
I also know the miss rate of array B: (N^3)/(N^3)
and A: (N^2/16)/N^3
. Could someone please explain to me how they got the total misses here too?
Upvotes: 2
Views: 7536
Reputation: 182
Cache is always accessed / managed with the granularity of a line. Here the line size is 64B. That means that in one cache line there will be 16 elements in one cache line (64B / 4B = 16). Next thing that is important here is every 17th element is in a new line, and once a line is brought to cache, it can possibly give 15 hits.
Having got that out of the way, let us look at the access pattern. A[0][0], B[0][0], C[0][0], A[0][1], B[1][0], C[0][0], A[0][2], B[2][0], C[0][0], ...... A[0][16], B[16][0], C[0][0], A[0][17], B[17][0], C[0][0], ...... and so on
Now Since a line has 16 elements and it is safe to assume a row-major placement of elements in the memory A[0][0] - A[0][15] will belong to one the first cache line (let us call this AL1) and similarly let B[0][0] - B[0][15] belong to line BL1. From this information we can write the access pattern in terms of cache lines. AL1, BL1, CL1, AL1, BL129, CL1, AL1, BL257, CL1, ..... AL1, BL2049, CL1, AL2, BL2176, CL1,.... and so on.
Now the cache size is 32kB that means the cache can hold 512 lines. Since this is L1, let us assume it is a two-way associative cache. Therefore there are 256 sets. This means that after every 256 lines of an array, the 257th line maps to the same set as the 1st line.
The cache contents upon access to lines mapping to set 1 looks like this
1st Iteration - inner
Access to - AL1 - Miss
AL1
Access to - BL1 - Miss
AL1
BL1
Access to - CL1 - Miss
CL1
BL1
2nd Iteration - inner
Access to - AL1 - Miss
CL1
AL1
Access to - CL1 - Hit
CL1
AL1
3rd Iteration - inner
Access to - AL1 - Hit
CL1
AL1
Access to - BL257 - Miss
BL257
AL1
Access to - CL1 - Miss
BL257
CL1
4th Iteration - Inner
Access to - AL1 - Miss
AL1
CL1
Access to - CL1 - Hit
AL1
CL1
5th Iteration - Inner
Access to - AL1 - Hit
CL1
AL1
Access to - BL513 - Miss
BL513
AL1
Access to - CL1 - Miss
BL513
CL1
. . .
16th Iteration - Inner
Access to - AL1 - Miss
AL1
CL1
Access to - CL1 - Hit
AL1
CL1
17th Iteration - Inner
Access to - BL2049 - Miss
BL2049
CL1
Access to - CL1 - Hit
BL2049
CL1
18th Iteration - Inner
Access to - CL1 - Hit
BL2049
CL1
For the 1st iteration of the middle loop. We have for set 1,
A -> M, M, H, M, H, M, H, M, H, M, H, M, H , M, H, M, ~ , ~ , ~ , ~ , .......
=> after 2048 iterations - 7 hits 9 misses, 16 accesses
B -> M, ~ , M, ~ , M, ~ , ........
=> after 2048 iterations - 0 Hits, 1024 misses, 1024 accesses
C -> M, H, M, H, M, H, M, H, M, H, M, H, M, H, M, H, H , H, H , H, ......
=> after 2048 iterations - 2040 hits, 8 misses, 2048 accesses
for 2nd - 16th iteration of the middle loop,
Exact hit / miss / access pattern as before.
for 17th - 2048th iteration of the middle loop,
A -> same as before
B -> No accesses
C -> No accesses
To summarize - for 1 iteration of the outer loop, we have for set 1,
A -> N*9 misses , N * 16 accesses
B -> N/2 * 16 Misses , N/2 * 16 accesses
C -> 8 * 16 Misses , N * 16 accesses
In the outer loop, we can see that every alternate iteration will have the same hit / miss / access pattern as the 1st iteration (summarized above) for arrays C and A. Every iteration of the outer loop will have the same hit / miss / access pattern a the 1st iteration for array B.
Hence, (Finally!)
For Set 1, at the end of the program, we have
A -> N^2 * 9 / 2 misses, N^2 * 8 accesses
B -> N^2 * 8 misses, N^2 * 8 accesses
C -> N * 64 misses, N^2 * 8 accesses
The access pattern will be similar across all sets. Therefore by the end of the program we have, across all sets
A -> N^2 * 1152 misses, N^3 accesses
B -> N^3 misses, N^3 accesses
C -> N^2 * 8 misses, N^3 accesses
I know this is different from what is indicated in the question. I could not figure out how. I will be glad to hear other responses.
Upvotes: 2