Copy local array is faster than array from arguments in c++?

Question

While optimizing some code I discovered some things that I didn't expected. I wrote a simple code to illustrate what I found below:

#include 
#include 
#include 

using namespace std;

int globalArr[1024][1024];

void initArr(int arr[1024][1024])
{
    memset(arr, 0, 1024 * 1024 * sizeof(int));
}


void run()
{
    int arr[1024][1024];
    initArr(arr);
    for(int i = 0; i < 1024; ++i)
    {
        for(int j = 0; j < 1024; ++j)
        {
            globalArr[i][j] = arr[i][j];
        }

    }
}

void run2(int arr[1024][1024])
{
    initArr(arr);
    for(int i = 0; i < 1024; ++i)
    {
        for(int j = 0; j < 1024; ++j)
        {
            globalArr[i][j] = arr[i][j];
        }

    }
}

int main()
{
    {
        auto start = chrono::high_resolution_clock::now();
        for(int i = 0; i < 256; ++i)
        {
            run();
        }
        auto duration = chrono::high_resolution_clock::now() - start;
        cout << "(run) Total time: " << chrono::duration_cast(duration).count() << " microseconds
";
    }

    {
        auto start = chrono::high_resolution_clock::now();
        for(int i = 0; i < 256; ++i)
        {
            int arr[1024][1024];
            run2(arr);
        }
        auto duration = chrono::high_resolution_clock::now() - start;
        cout << "(run2) Total time: " << chrono::duration_cast(duration).count() << " microseconds
";        
    }

    return 0;
}

I build the code with g++ version 6.4.0 20180424 with -O3 flag. Below is the result running on ryzen 1700.

(run) Total time: 43493 microseconds
(run2) Total time: 134740 microseconds

I tried to see the assembly with godbolt.org (Code separated in 2 urls)

https://godbolt.org/g/aKSHH6

https://godbolt.org/g/zfK14x

But I still don't understand what actually made the difference.

So my questions are: 1. What's causing the performance difference? 2. Is it possible passing array in argument with the same performance as local array?

Edit: Just some extra info, below is the result build using O2

(run) Total time: 94461 microseconds
(run2) Total time: 172352 microseconds

Edit again: From xaxxon's comment, I try remove the initArr call in both functions. And the result actually run2 is better than run

(run) Total time: 45151 microseconds
(run2) Total time: 35845 microseconds

But I still don't understand the reason.

Copy local array is faster than array from arguments in c++?

Answers (1)

Related Questions