user13845326
user13845326

Reputation:

In C++, is a 2D array slower in overall operations than a 1D array that behaves like a 2D array?

I just want to ask which is better to implement in terms of performance and efficiency when dealing with 2D arrays, and possibly code complexity since we need to make a matrix library for our group project.

I have a code below (which I think may not be a good example, and I also have a limited ram that's why)

#include <iostream>
#include <chrono>

using namespace std;
using namespace std::chrono;

int main(){

    auto start = high_resolution_clock::now();
    auto arr2d = new unsigned long long int[10000ul][10000ul];

    int cnt = 1;
    for(unsigned long long int i=0; i<10000ul; ++i){
        for(unsigned long long int j=0; j<10000ul; ++j){
            arr2d[i][j] = cnt;
            cnt++;
        }
    }
    auto stop = high_resolution_clock::now();   
    auto duration = duration_cast<microseconds>(stop-start);
    cout<<duration.count()<<endl;

    auto start1 = high_resolution_clock::now();
    auto arr1d = new unsigned long long int[100000000ull];

    int cnt1 = 1;
    for(unsigned long long int i=0; i<10000ul; ++i){
        for(unsigned long long int j=0; j<10000ul; ++j){
            arr1d[(i*10000ul)+j] = cnt1;
            cnt1++;
        }
    }
    auto stop1 = high_resolution_clock::now();  

    auto duration1 = duration_cast<microseconds>(stop1-start1);
    cout<<duration1.count()<<endl;

    return 0;
}

and it shows that the performance hasn't really have that much of a difference?

can anyone give us an advice with explanation on which route should we take?

Upvotes: 1

Views: 190

Answers (2)

JaMiT
JaMiT

Reputation: 16873

The performance of the two approaches is expected to be identical.

A 2D array, like unsigned long long int[10000ul][10000ul] occupies a single contiguous block of memory, the same as a 1D array with the same number of elements, like unsigned long long int[100000000ull]. No difference here.

With an eye towards accessing elements in the array, let's simplify the notation by assuming sizeof(unsigned long long) is 8. When you access arr2d[i][j], the compiler adds (i*10000ul + j)*8 to the starting address of arr2d to find that element. When you access arr1d[(i*10000ul)+j], the compiler adds ((i*10000ul)+j)*8 to the starting address of arr1d to find that element. No difference here.

There is no difference in the low-level implementation of the two approaches, so you should use whatever is more convenient. However, before leveraging this answer, be sure you are using a 2D array (as in T [M][N]) and not an array of pointers to arrays (as in T* [M]). The notation for using one is often indistinguishable from that for the other, yet the array of pointers uses more memory and does not guarantee that everything is contiguous. Whether or not this measurably affects performance in your code depends on which operations are performed. (While the question does show a true 2D array, future readers may benefit from this caveat.)

Speaking of convenience, you might find that std::vector is more convenient than managing memory yourself. If you opt to use a vector, though, you may want to stick with the 1D approach. A nested vector, as in std::vector< std::vector<unsigned long long int>>, is analogous to an array of pointers, so you lose the benefits of a single contiguous block of memory.

Upvotes: 1

Junmin Hao
Junmin Hao

Reputation: 151

In your particular case, both of your for loops are actually accessing the memory sequentially, even in the 2-D array case (due to the memory layout of the 2-D array), so there is no real difference. To see some difference execution speed, try the following pattern:

for(unsigned long long int i=0; i<10000ull; ++i){
    for(unsigned long long int j=0; j<10000ull; ++j){
        arr2d[j][i] = 1-1*3+3/4;  // note i and j are switched.
    }
}

When you access the memory sequentially, the CPU is smart enough to prefetch the memory content into the CPU's L1 and L2 caches. But in the modified version, the access to the memory is not sequential -- it jumps 10000*8 bytes ahead each time, and the CPU won't be able to prefetch that far ahead, so the code will run slower.

Another thing to note in your code is that the expression 1-1*3+3/4 will actually be precomputed by the compiler because it contains only constants, so your code will become only a series of assignments.

Upvotes: 2

Related Questions