Emil Kabirov
Emil Kabirov

Reputation: 569

C++ compiler optimization of a loop with pointers copying

I compile this code on godbolt.org with -O2 and compilers don't optimize it using some memcpy, honestly running the loop.

void foo(int* dst, int* src, int n)
{
    for (int i = 0; i < n; ++i)
    {
        dst[i] = src[i];
    }
}

But if I replace "= src[i]" with "= 0" they use memset though. But again, when I replace it with " = 1", they run a loop. Why do they avoid memcpy and memset when the value to be set is not zero? I thought that's one of the first optimizations they will perform.

Upvotes: 2

Views: 492

Answers (2)

J&#233;r&#244;me Richard
J&#233;r&#244;me Richard

Reputation: 50453

To complete the good answer of @MilesBudnek:

memset works at the byte granularity and you are working with int which are generally more than 1 bytes (4 bytes). This is why the compiler cannot easily replace the assignment = 1 with a memset.

Note also that -O2 does not enable vectorization for GCC although it apparently does for Clang. -ftree-vectorize (included in -O3) is needed for GCC to generate much faster SIMD instructions (not as fast as the memcpy/memmove/memset on many platforms).

Upvotes: 0

Miles Budnek
Miles Budnek

Reputation: 30569

The ranges pointed to by src and dest may overlap, in which case the behavior of memcpy would be undefined. Thus optimizing this function to just call memcpy isn't appropriate.


memmove would be appropriate, but its behavior is different from your function when the src and dest ranges overlap. Consider the following:

int arr[5] = {1, 2, 3, 4, 5};
foo(arr + 1, arr, 4);

Your function would result in arr containing {1, 1, 1, 1, 1} after the call, while memmove is specified to result in arr containing {1, 1, 2, 3, 4}. Thus the compiler can't optimize foo to a call to memmove either.


C added the restrict keyword in C99 to tell the compiler that two ranges won't overlap, but C++ has not adopted that particular feature.

Upvotes: 3

Related Questions