Cache-Friendly Matrix, for access to adjacent cells

Question

Current Design

In my program I have a big 2-D grid (1000 x 1000, or more), each cell contains a small information. In order to represent this concept the choice is quite trivial: a matrix data structure.

The correspondent code (in C++) is something like:

int w_size_grid = 1000;
int h_size_grid = 1000;
int* matrix = new int[w_size_grid * h_size_grid];

As you can notice I've used a vector, but the principle is the same.

In order to access an element of the grid, we need a function that given a cell in the grid, identified by (x,y), it returns the value stored in that cell.

Mathematically: f(x,y) -> Z obviously: f: Z^2 -> Z where Z is the set of integer numbers.

That can be trivially achieved with a linear function. Here a C++ code representation:

int get_value(int x, int y) {
  return matrix[y*w_size_grid + x];
}

Additional Implementation Notes

Actually the design concept requires a sort of "circular-continuous grid": the access indices for the cell can go out the limits of the grid itself.

That means, for example, the particular case: get_value(-1, -1); is still valid. The function will just return the same value as get_value(w_size_grid - 1, h_size_grid -1);.

Actually this is no a problem in the implementation:

int get_value(int x, int y) {
  adjust_xy(&x, &y);  // modify x and y in accordance with that rule.
  return matrix[y*w_size_grid + x];
}

Anyway this is just an additional note in order to make the scenario more clear.

What is the problem?

The problem presented above is very trivial and simple to design and to implement.

My problem comes with the fact that the matrix is updated with an high frequency. Each cell in the matrix is read and possibly updated with a new value.

Obviously the parsing of the matrix is done with two loop in according to a cache-friend design:

for (int y = 0; y < h_size_grid; ++y) {
  for (int x = 0; x < w_size_grid; ++x) {
    int value = get_value(x, y);
  }
}

The inner cycle is x since [x-1] [x] [x+1] are stored contiguously. Indeed, that cycle exploits principle of locality.

The problem comes now because, actually in order to update a value in a cell, it depends on values in the adjacent cells.

Each cell has exactly eight neighbours, which are the cells that are horizontally, vertically, or diagonally adjacent.

(-1,-1) | (0,-1) | (1,-1)
-------------------------
(-1,0)  | (0,0)  | (0, 1)
-------------------------
(-1,1)  | (0,1)  | (1,1)

So the code is intuitively:

for (int y = 0; y < h_size_grid; ++y) {
  for (int x = 0; x < w_size_grid; ++x) {
    int value = get_value(x, y);
    auto values = get_value_all_neighbours(x, y);  // values are 8 integer
  }
}

The function get_value_all_neighbours(x,y) will access one row up and one row down in the matrix relatively to y. Since a row in the matrix is quite big, it involves a cache miss and it dirties the caches themselves.

The Question

One I have finally present you the scenario and the problem, my question is how to "solve" the problem.

Using some additional data structures, or reorganizing the data is there a way to exploit the caches and to avoid all those miss?

Some Personal Consideration

My feelings guide me toward a strategic data structure.

I've thought about a reimplementation of the order in which the values are stored in the vector, trying to stored in contiguous indices those cell which are neighbours.

That implies a no-more-linear function for get_value. After some thinking, I believe is not possible to find this no-linear function.

I've also thought some additional data-structure like hash-table to store adjacent value for each cell, but I think is an overkill more in space and maybe in CPU cycle also.

TilmannZ · Accepted Answer

Lets assume you have indeed a problem with cache misses that can't easily be avoided (referring to other answers here).

You could use a space filling curve to organize your data in a cache friendly way. Essentially, space filling curves map a volume or plane (such as your matrix) to a linear representations, such that values that are close together in space end (mostly) up close together in the linear representation. In effect, if you store the matrix in a z-ordered array, neighbouring elements have a high likelihood of being on the same memory page.

The best proximity mapping is available with the Hilbert Curve, but it is expensive to calculate. A better option may be a z-curve (Morton-Order). It provides good proximity and is fast to calculate.

Z-Curve: Essentially, to get the ordering, you have to interleave the bits of your x and y coordinate into a single value, called 'z-value'. This z-value determines the position in your list of matrix values (or even simply the index in an array if you use an array). The z-values are consecutive for a completely filled matrix (every cell is used). Conversely, you can de-interleave the position in the list (=array index) and get your x/y coordinates back.

Interleaving bits of two values is quite fast, there are even dedicated CPU instructions to do this with few cycles. If you can't find these (I can't, at the moment), you can simply use some bit twiddling tricks.

Cache-Friendly Matrix, for access to adjacent cells

Current Design

Additional Implementation Notes

What is the problem?

The Question

Some Personal Consideration

Answers (2)

Data Content

Data Usage

Summary

Related Questions