Reputation: 569
I compile this code on godbolt.org with -O2 and compilers don't optimize it using some memcpy, honestly running the loop.
void foo(int* dst, int* src, int n)
{
for (int i = 0; i < n; ++i)
{
dst[i] = src[i];
}
}
But if I replace "= src[i]" with "= 0" they use memset though. But again, when I replace it with " = 1", they run a loop. Why do they avoid memcpy and memset when the value to be set is not zero? I thought that's one of the first optimizations they will perform.
Upvotes: 2
Views: 492
Reputation: 50453
To complete the good answer of @MilesBudnek:
memset
works at the byte granularity and you are working with int
which are generally more than 1 bytes (4 bytes). This is why the compiler cannot easily replace the assignment = 1
with a memset.
Note also that -O2
does not enable vectorization for GCC although it apparently does for Clang. -ftree-vectorize
(included in -O3
) is needed for GCC to generate much faster SIMD instructions (not as fast as the memcpy
/memmove
/memset
on many platforms).
Upvotes: 0
Reputation: 30569
The ranges pointed to by src
and dest
may overlap, in which case the behavior of memcpy
would be undefined. Thus optimizing this function to just call memcpy
isn't appropriate.
memmove
would be appropriate, but its behavior is different from your function when the src
and dest
ranges overlap. Consider the following:
int arr[5] = {1, 2, 3, 4, 5};
foo(arr + 1, arr, 4);
Your function would result in arr
containing {1, 1, 1, 1, 1}
after the call, while memmove
is specified to result in arr
containing {1, 1, 2, 3, 4}
. Thus the compiler can't optimize foo
to a call to memmove
either.
C added the restrict
keyword in C99 to tell the compiler that two ranges won't overlap, but C++ has not adopted that particular feature.
Upvotes: 3