cellka
cellka

Reputation: 129

Hit / Miss rate counting by array caching

I'm reading Computer Systems book from Bryant & O'Hallaron, there is an exercises the solution of which seems to be incorrect. So I'd like to make it sure

given

struct point {
  int x; 
  int y;  };


struct array[32][32];

for(i = 31; i >= 0; i--) {
   for(j = 31; j >= 0; j--) {
      sum_x += array[j][i].x;
      sum_y += array[j][i].y; }}

sizeof(int) = 4;

we have 4096 byte cache with block (line) size 32 byte. The hit rate is asked.

My reasoning was, we have 4096/32 = 128 blocks, each block can store 4 points (2*4*4 = 32), therefore the cache can store 1/2 of the array, i.e. 512 points (total 32*32 = 1024). Since the code accesses array in column major order, access to each point is miss. So we have array[j][i].x is always miss, while array[j][i].y is hit. Finally miss rate = hit rate = 1/2.

Problem: The solution says the hit rate is 3/4 because the cache can store the whole array.

But according to my reasoning the cache can store only half points

Did I miss something?

Upvotes: 3

Views: 2298

Answers (2)

ottomeister
ottomeister

Reputation: 5808

If you've transcribed the program correctly then you're correct, the 3/4 answer is wrong.

The 3/4 answer would be correct if the indexes in the innermost sum += ... statements were arranged so that the rightmost index varied the most quickly, i.e. as:

    sum_x += array[i][j].x;
    sum_y += array[i][j].y;

In that case the 1st, 5th, 9th ... iterations of the loop would miss, but the line loaded into the cache by each of those misses would cause the next three iterations to hit.

However, with the program as written, every iteration misses. Each cache line that is loaded from memory supplies data for only a single point, and then that line is always replaced before the data for any of the other three points in the line is accessed.

As an example (assuming for simplicity that the address of the first member array[0][0] is aligned with the start of the cache), the reference to array[31][31] in the first pass through the loop is a miss that causes line 127 of the cache to be loaded. Line 127 now contains the data for [31][28], [31][29], [31][30] and [31][31]. However, the fetch of array[15][31] causes line 127 to be overwritten before array[31][30] is referenced, so when [31][30]'s turn eventually arrives it is a miss too. And then a miss at [15][30] replaces the line before [31][29] is referenced.

IMO your 1/2 hit ratio is overgenerous because it counts the access to the .y coordinate as a hit. However, that's not what the original 3/4 answer does. If the fetch of the .y coordinate were counted as a hit then the original answer would have been 7/8. Instead it counts each complete point, or perhaps each loop iteration, as a hit or a miss. By that measure the hit rate for the program as written in your question is a nice round 0.

Upvotes: 1

thb
thb

Reputation: 14454

The array's top four rows occupy a part of the cache:

|*ooooooooooooooooooooooooooooooo|
|*ooooooooooooooooooooooooooooooo|
|*ooooooooooooooooooooooooooooooo|
|*ooooooooooooooooooooooooooooooo|
|xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx|
|xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx|
|xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx|
|xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx|
|xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx|
|xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx|
|...

Above is a schematic of the array as an applied mathematician would write the array on paper. Each element consists of an (x,y) pair, a point.

The four rows labeled o in the diagram comprise 128 points, enough to fill 1024 bytes, which is only one quarter of the cache, but see: in your code, the variable i is

  • the major loop counter and also
  • the array's row index (as written on paper).

So, let's look at the diagram again. How do your nested loops step through the array as diagrammed?

Answer: apparently, your loops step rightward across the top row as diagrammed, with j (column) as the minor loop counter. However, as you have observed, the array is stored by columns. Therefore, when element [j][i] == [0][0] is loaded, an entire cache line is loaded with it. And what comprises that cache line? It's the four elements marked * in the diagram.

Therefore, while your inner loop iterates across the array's top row as diagrammed, the cache misses every time, fetching four elements each time. And then for the next three rows, it's all hits.

This isn't easy to think about. It's a fine problem, nor would I expect you to grasp my answer instantly, but if you carefully consider the sequence of loads as I have explained, it should (after a bit of pondering) begin to make sense.

With the given loop nesting, the hit rate is indeed 3/4.

FURTHER DISCUSSION

In comments, you have asked a good follow-up question:

Can you write an element (e.g. array[3][14].x) that would hit?

I can. The array[j][i] == array[10][5] would hit. (Both .x and .y would hit.)

I will explain. The array[j][i] == array[10][4] would miss, whereas array[10][5], array[10][6] and array[10][7] would eventually hit. Why eventually? This is significant. Although all four of the elements I have named are loaded by cache hardware at once, array[10][5] is not accessed by your code (that is, by the CPU) when array[10][4] is accessed. Rather, after array[10][4] is accessed, array[11][4] is next accessed by the program and CPU.

The program and CPU only get around to accessing array[10][5] rather later.

And, indeed, if you think about it, this makes sense, doesn't it, because that is part of what caches do: they load additional data now, quietly as part of a cache line, so that the CPU can quickly access the additional data later if it needs it.

APPENDIX: FORTRAN/BLAS/LAPACK MATRIX ORDERING

It is standard in numerical computing to store matrices by column rather than by row. This is called column-major storage. Unfortunately, unlike the earlier Fortran programming language, the C programming language was not originally designed for numerical computing, so, in C, to store arrays by column, one must write array[column][row] == array[j][i]—which notation of course reverses the way an applied mathematician with his or her pencil would write it.

This is an artifact of the C programming language. The artifact has no mathematical significance but, when programming in C, you must remember to type [j][i]. [Were you programming in the now mostly obsolete Fortran programming language, you would type (i, j), but this isn't Fortran.]

The reason column-major storage is standard has to do with the sequence in which the CPU performs scalar, floating-point multiplications and additions when, in mathematical/pencil terminology, a matrix [A] left-operates on a column vector x. The standard Basic Linear Algebra Subroutines (BLAS) library, used by LAPACK and others, works this way. You and I should work this way, too, not only because we are likely to need to interface with BLAS and/or LAPACK but because, numerically, it's smoother.

Upvotes: 1

Related Questions