Reputation: 15586
I have this code for memcpy
as part of my implementation of the standard C library which copies memory from src
to dest
one byte at a time:
void *memcpy(void *restrict dest, const void *restrict src, size_t len)
{
char *dp = (char *restrict)dest;
const char *sp = (const char *restrict)src;
while( len-- )
{
*dp++ = *sp++;
}
return dest;
}
With gcc -O2
, the code generated is reasonable:
memcpy:
.LFB0:
movq %rdi, %rax
testq %rdx, %rdx
je .L2
xorl %ecx, %ecx
.L3:
movzbl (%rsi,%rcx), %r8d
movb %r8b, (%rax,%rcx)
addq $1, %rcx
cmpq %rdx, %rcx
jne .L3
.L2:
ret
.LFE0:
However, at gcc -O3
, GCC optimizes this naive byte-for-byte copy into a memcpy
call:
memcpy:
.LFB0:
testq %rdx, %rdx
je .L7
subq $8, %rsp
call memcpy
addq $8, %rsp
ret
.L7:
movq %rdi, %rax
ret
.LFE0:
This won't work (memcpy
unconditionally calls itself), and it causes a segfault.
I've tried passing -fno-builtin-memcpy
and -fno-loop-optimizations
, and the same thing occurs.
I'm using GCC version 8.3.0:
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/x86_64-cros-linux-gnu/8.3.0/lto-wrapper
Target: x86_64-cros-linux-gnu
Configured with: ../configure --prefix=/usr/local --libdir=/usr/local/lib64 --build=x86_64-cros-linux-gnu --host=x86_64-cros-linux-gnu --target=x86_64-cros-linux-gnu --enable-checking=release --disable-multilib --enable-threads=posix --disable-bootstrap --disable-werror --disable-libmpx --enable-static --enable-shared --program-suffix=-8.3.0 --with-arch-64=x86-64
Thread model: posix
gcc version 8.3.0 (GCC)
How do I disable the optimization that causes the copy to be transformed into a memcpy
call?
Upvotes: 9
Views: 3568
Reputation: 26166
This won't work (memcpy unconditionally calls itself), and it causes a segfault.
Redefining memcpy
is undefined behavior.
How do I disable the optimization that causes the copy to be transformed into a memcpy call (preferably while still compiling with -O3)?
Don't. The best approach is fixing your code instead:
In most cases, you should use another name.
In the rare case you are really implementing a C library (as discussed in the comments), and you really want to reimplement memcpy
, then you should be using compiler-specific options to achieve that. For GCC, see -fno-builtin*
and -ffreestanding
, as well as -nodefaultlibs
and -nostdlib
.
Upvotes: 6
Reputation: 133988
One thing that seems to be sufficient here: instead of using -fno-builtin-memcpy
use -fno-builtin
for compiling the translation unit of memcpy
alone!
An alternative would be to pass -fno-tree-loop-distribute-patterns
; though this might be brittle as it forbids the compiler from reorganizing the loop code first and then replacing part of them with calls to mem*
functions.
Or, since you cannot rely anything in the C library, perhaps using -ffreestanding
could be in order.
Upvotes: 15