1 or 3 dimensional array?

Question

The author of this topic claims that accessing a 1D array converted from a 2D array with fixed lengths is much faster than accessing the original 2D array, at least in C#. I wonder whether this applies also to C/C++ or not.

When using 3D arrays, the value at (x, y, z) is fetched by dereferencing the pointer to the array three times:

int val = arr[x][y][z];

But you can convert the array to an 1D array and calculate an index for each coordinate, so the code changes into:

int val = arr[SIZE_X * SIZE_Y * z + SIZE_X * y + x];

This would replace the three dereferencing operations by one dereferencing and 3 multiplications and 2 additions.

The question is: is dereferencing three times slower or faster than calculating the index of the coordinates?

Benchmark test output:

3 dimensions: 5s
1 dimension: 14s
1 dimension fast: 4s

code:

#include 
#include 

int main(int argc, char** argv)
{
    const int SIZE_X = 750, SIZE_Y = SIZE_X, SIZE_Z = SIZE_X;
    const int SIZE_XY = SIZE_X * SIZE_Y;

    time_t startTime;

    // 3 dimensions
    time(&startTime);
    int ***array3d = new int **[SIZE_X];
    for (int x = 0; x < SIZE_X; ++x)
    {
        array3d[x] = new int *[SIZE_Y];
        for (int y = 0; y < SIZE_Y; ++y)
            array3d[x][y] = new int[SIZE_Z];
    }

    for (int x = 0; x < SIZE_X; ++x)
        for (int y = 0; y < SIZE_Y; ++y)
            for (int z = 0; z < SIZE_Z; ++z)
                array3d[x][y][z] = 0;

    for (int x = 0; x < SIZE_X; ++x)
    {
        for (int y = 0; y < SIZE_Y; ++y)
            delete[] array3d[x][y];
        delete[] array3d[x];
    }

    std::cout << "3 dimensions: " << time(0) - startTime << "s
";

    time(&startTime);
    int *array1d = new int[SIZE_X * SIZE_Y * SIZE_Z];
    for (int x = 0; x < SIZE_X; ++x)
        for (int y = 0; y < SIZE_Y; ++y)
            for (int z = 0; z < SIZE_Z; ++z)
                array1d[x + SIZE_X * y + SIZE_XY * z] = 0;
    delete[] array1d;
    std::cout << "1 dimension: " << time(0) - startTime << "s
";

    time(&startTime);
    array1d = new int[SIZE_X * SIZE_Y * SIZE_Z];
    int i = 0;
    for (int x = 0; x < SIZE_X; ++x)
        for (int y = 0; y < SIZE_Y; ++y)
            for (int z = 0; z < SIZE_Z; ++z)
                array1d[++i] = 0;
    delete[] array1d;
    std::cout << "1 dimension fast: " << time(0) - startTime << "s
";

    return 0;
}

Result: 3d is faster and just a bit slower than the fast version of the 1 dimensional array.

EDIT: I changed the 1 dimensional array loop to this:

for (int z = 0; z < SIZE_Z; ++z)
    for (int y = 0; y < SIZE_Y; ++y)
        for (int x = 0; x < SIZE_X; ++x)
            array1d[x + SIZE_X * y + SIZE_XY * z] = 0;

And it took merely 5 seconds, as fast as the 3d variant.

So the order of access matters, not the dimensions. I think.

keltar · Accepted Answer

Sorry about long answer.

It is more about memory access pattern. But first, about benchmarking a little:

When benchmarking, never count seconds, as second is too long. At least use milliseconds.
Don't include parts that you don't want to test into benchmarked section - in given example, it is new and delete, they should be outside.
changing order of benchmarks could produce different results because of cache utilisation
be sure that all benchmarked versions follows the same algorithm (if you testing implementations, not algorithms themselves). This part in given example isn't true, I'll explain a bit later.

Now back to arrays. First of all, in given example, memset should be used, not reinventing wheel. I understand that it is for testing purpose, but in that case it is better to use e.g. rand() (although values should be lowered, as rand is much much slower than =0, it takes too long to test). But no matter, here it goes:

In 3-dimensional version your innermost loop accesses linear array. This is very cache-friendly and fast way. Dereferencing isn't performed on every loop iteration, because compiler see that it can't change. So, most heavily used line of code - innermost loop - accesses linear memory array.

'fast' version of 1d array doing the same. Good one too. memset is still better, though :-).

But when it comes to 'slow' 1d version, things are messed up. Look at your index line: array1d[x + SIZE_X * y + SIZE_XY * z] = 0;. Innermost loop iterates z, so on each iteration you setting veeeeery far int. This access pattern simply makes data cache useless, and most of the time your program just waits for data to be written into memory. However, if you change it to array1d[SIZE_XY * x + SIZE_X * y + z] = 0;, it once again becomes linear array access, and therefore becomes very fast. Plus if you want to, left part of addition may be calculated in outer loop, potentially making it a bit faster.

But real greatness of 1d array is that it could be accessed linearly from start to end. If algorithm that uses it may be rearranged to traverse array that way - it's win-win scenario.

If you want to test it, just change [x][y][z] order in your 3d version to [z][y][x] and see dramatically reduced performance.

So, about initial question - answer is 'it depends'. Most of all it depends upon data access pattern, but also upon many other things, like actual depth of array dimensions, size of each dimension, frequency of supporting effects like new/delete, and many many more. But if you can linearise data access - it would be fast already, but in that case you don't need 3D, right?

(yes, I am obviously in favour of 1D arrays with manually calculated index, so count me biased. Sorry).

1 or 3 dimensional array?

Answers (2)

Case #1 - Statically Allocated Arrays

Case #2 - Dynamically Allocated Arrays

Related Questions