Reputation: 48128
Using GCC you can do something like this.
void foo(MyStruct *a, const MyStruct *b)
{
memcpy(&a[0], b, sizeof(*a));
memcpy(&a[1], b, sizeof(*a));
memcpy(&a[2], b, sizeof(*a));
}
When writing portable code, using modern C compilers *, this can optimize to output the same asm
as ...
void foo(MyStruct a[3], const MyStruct *b)
{
a[0] = *b;
a[1] = *b;
a[2] = *b;
}
My question is, is it reasonable to assume the function call to memcpy
will always be be optimized out?
I'm asking this because I was considering to use memcpy
in a macro which gets instantiated many times with size known at compile time. If this will call memcpy
on some platforms, I'd prefer to avoid calling it at all.
eg: Implement generic swap macro in C
* modern C compilers (GCC/Clang/MSVC/ICC). with standard/safe optimization level set.
Upvotes: 2
Views: 1336
Reputation: 8405
The memcpy
function is a very broad function that takes void *
parameters as its inputs.
From ISO/IEC 9899:1999 (C99):
Synopsis:
void *memcpy(void * restrict s1, const void * restrict s2, size_t n);
Description:
The memcpy function copies n characters from the object pointed to by s2 into the object pointed to by s1. If copying takes place between objects that overlap, the behavior is undefined.
As you can see, the description is quite vague as to what optimizations are allowed to be placed on it. However, the function prototype does use the keyword restrict
allowing optimizing compilers to recognize the two as distinct memory regions.
But again from §6.7.3.1 Formal definition of restrict:
A translator is free to ignore any or all aliasing implications of uses of restrict.
Suggesting that perhaps not all optimizing compilers recognize the flag directly. In this case, memcpy would be working on (from the compiler's perspective) possible duplicate/overlapping memory regions and be unable to make the deduction of functional equivalence to a[0] = *b
since that may be modifying the value of b
as well.
Standards may have changed in C11 but I don't a copy of that so I can't say...
Edit:
The N1570 draft has the same thing written for both sections so it should be the same, though I haven't read the whole thing to make sure of this.
Upvotes: 0
Reputation: 1
Some naive C compilers (like tinycc) don't optimize much and won't optimize calls to memcpy
; but they produce so slow code that nobody careing about binary code performance would use them.
However, a good reason to use tcc
might be when you don't care at all about runtime performance, but you care a lot about having a tiny compiler able to compile quickly
In theory, optimization is not mandated by the C99 or C11 standard (even a real or virtual computer is not required: you could run a standard C program with a bunch of human slaves, but that is unethical, unreliable, and inefficient). And the C99 standard does not require a compiler; it could be a naive interpreter and still be a standard conforming implementation.
In practice, any serious C compiler, when asked to optimize, would optimize your calls to memcpy
See also this answer on Programmers.
So I would use memcpy
like you do, but document that a modern C optimizing compiler is expected (and perhaps recommend recent versions of compilers like GCC 4.8 at least, or Clang 3.4 at least)
Upvotes: 3