Reputation: 3706
The author of this topic claims that accessing a 1D array converted from a 2D array with fixed lengths is much faster than accessing the original 2D array, at least in C#. I wonder whether this applies also to C/C++ or not.
When using 3D arrays, the value at (x, y, z) is fetched by dereferencing the pointer to the array three times:
int val = arr[x][y][z];
But you can convert the array to an 1D array and calculate an index for each coordinate, so the code changes into:
int val = arr[SIZE_X * SIZE_Y * z + SIZE_X * y + x];
This would replace the three dereferencing operations by one dereferencing and 3 multiplications and 2 additions.
The question is: is dereferencing three times slower or faster than calculating the index of the coordinates?
Benchmark test output:
3 dimensions: 5s
1 dimension: 14s
1 dimension fast: 4s
code:
#include <iostream>
#include <time.h>
int main(int argc, char** argv)
{
const int SIZE_X = 750, SIZE_Y = SIZE_X, SIZE_Z = SIZE_X;
const int SIZE_XY = SIZE_X * SIZE_Y;
time_t startTime;
// 3 dimensions
time(&startTime);
int ***array3d = new int **[SIZE_X];
for (int x = 0; x < SIZE_X; ++x)
{
array3d[x] = new int *[SIZE_Y];
for (int y = 0; y < SIZE_Y; ++y)
array3d[x][y] = new int[SIZE_Z];
}
for (int x = 0; x < SIZE_X; ++x)
for (int y = 0; y < SIZE_Y; ++y)
for (int z = 0; z < SIZE_Z; ++z)
array3d[x][y][z] = 0;
for (int x = 0; x < SIZE_X; ++x)
{
for (int y = 0; y < SIZE_Y; ++y)
delete[] array3d[x][y];
delete[] array3d[x];
}
std::cout << "3 dimensions: " << time(0) - startTime << "s\n";
time(&startTime);
int *array1d = new int[SIZE_X * SIZE_Y * SIZE_Z];
for (int x = 0; x < SIZE_X; ++x)
for (int y = 0; y < SIZE_Y; ++y)
for (int z = 0; z < SIZE_Z; ++z)
array1d[x + SIZE_X * y + SIZE_XY * z] = 0;
delete[] array1d;
std::cout << "1 dimension: " << time(0) - startTime << "s\n";
time(&startTime);
array1d = new int[SIZE_X * SIZE_Y * SIZE_Z];
int i = 0;
for (int x = 0; x < SIZE_X; ++x)
for (int y = 0; y < SIZE_Y; ++y)
for (int z = 0; z < SIZE_Z; ++z)
array1d[++i] = 0;
delete[] array1d;
std::cout << "1 dimension fast: " << time(0) - startTime << "s\n";
return 0;
}
Result: 3d is faster and just a bit slower than the fast version of the 1 dimensional array.
EDIT: I changed the 1 dimensional array loop to this:
for (int z = 0; z < SIZE_Z; ++z)
for (int y = 0; y < SIZE_Y; ++y)
for (int x = 0; x < SIZE_X; ++x)
array1d[x + SIZE_X * y + SIZE_XY * z] = 0;
And it took merely 5 seconds, as fast as the 3d variant.
So the order of access matters, not the dimensions. I think.
Upvotes: 2
Views: 357
Reputation: 30136
Why don't you simply check out the disassembly of each option and find out?
Of course, the disassembly depends on the compiler in use, which in turn depends on the CPU architecture and its supported operations.
This is in fact the most important statement here, as each option may have its own advantages and disadvantages over the other, depending on your platform (compiler, linker, processor).
So without specifying the underlying platform, there might not be a decisive answer to the general question at hand.
The answer below is divided into two cases.
In each case, it examines both options (a 1D-array and a 3D-array), using the disassembly of each option compiled with Microsoft Visual C++ 2010 for Pentium E5200 as an example.
#define X 10
#define Y 10
#define Z 10
int val = array3d[x][y][z];
mov eax,dword ptr [x]
imul eax,eax,190h
add eax,dword ptr [array3d]
mov ecx,dword ptr [y]
imul ecx,ecx,28h
add eax,ecx
mov edx,dword ptr [z]
mov eax,dword ptr [eax+edx*4]
mov dword ptr [val],eax
int val = array1d[x+X*y+X*Y*z];
mov eax,dword ptr [y]
imul eax,eax,0Ah
add eax,dword ptr [x]
mov ecx,dword ptr [z]
imul ecx,ecx,64h
add eax,ecx
mov edx,dword ptr [array1d]
mov eax,dword ptr [edx+eax*4]
mov dword ptr [val],eax
As you can see, the "math" is slightly different, but apart from that, these two options are practically identical. So the only thing that may affect performance here is runtime caching, though I don't how either one of these two options has a clear advantage over the other with regards to that aspect.
#define X 10
#define Y 10
#define Z 10
int val = array3d[x][y][z];
mov eax,dword ptr [x]
mov ecx,dword ptr [array3d]
mov edx,dword ptr [ecx+eax*4]
mov eax,dword ptr [y]
mov ecx,dword ptr [edx+eax*4]
mov edx,dword ptr [z]
mov eax,dword ptr [ecx+edx*4]
mov dword ptr [val],eax
int val = array1d[x+X*y+X*Y*z];
mov eax,dword ptr [y]
imul eax,eax,0Ah
add eax,dword ptr [x]
mov ecx,dword ptr [z]
imul ecx,ecx,64h
add eax,ecx
mov edx,dword ptr [array1d]
mov eax,dword ptr [edx+eax*4]
mov dword ptr [val],eax
This time, the results are notably different, yet it's rather hard to determine which one (if any) is consistently better than the other. When using a 3D-array, there seem to be a lot more Load (mov
) operations than when using a 1D-array. So the runtime performance here is highly dependent on the location of each array in memory (RAM, L2 Cache, etc).
Upvotes: 1
Reputation: 18399
Sorry about long answer.
It is more about memory access pattern. But first, about benchmarking a little:
new
and delete
, they should be outside.Now back to arrays. First of all, in given example, memset
should be used, not reinventing wheel. I understand that it is for testing purpose, but in that case it is better to use e.g. rand()
(although values should be lowered, as rand is much much slower than =0, it takes too long to test). But no matter, here it goes:
In 3-dimensional version your innermost loop accesses linear array. This is very cache-friendly and fast way. Dereferencing isn't performed on every loop iteration, because compiler see that it can't change. So, most heavily used line of code - innermost loop - accesses linear memory array.
'fast' version of 1d array doing the same. Good one too. memset
is still better, though :-).
But when it comes to 'slow' 1d version, things are messed up. Look at your index line: array1d[x + SIZE_X * y + SIZE_XY * z] = 0;
. Innermost loop iterates z
, so on each iteration you setting veeeeery far int. This access pattern simply makes data cache useless, and most of the time your program just waits for data to be written into memory. However, if you change it to array1d[SIZE_XY * x + SIZE_X * y + z] = 0;
, it once again becomes linear array access, and therefore becomes very fast. Plus if you want to, left part of addition may be calculated in outer loop, potentially making it a bit faster.
But real greatness of 1d array is that it could be accessed linearly from start to end. If algorithm that uses it may be rearranged to traverse array that way - it's win-win scenario.
If you want to test it, just change [x][y][z]
order in your 3d version to [z][y][x]
and see dramatically reduced performance.
So, about initial question - answer is 'it depends'. Most of all it depends upon data access pattern, but also upon many other things, like actual depth of array dimensions, size of each dimension, frequency of supporting effects like new/delete, and many many more. But if you can linearise data access - it would be fast already, but in that case you don't need 3D, right?
(yes, I am obviously in favour of 1D arrays with manually calculated index, so count me biased. Sorry).
Upvotes: 3