Reputation: 170745

Do GCC and Clang optimize field-by-field struct copy?

E.g. given

typedef struct A {
    int a;
    int b;
    int c;
} A;

typedef struct B {
    int d;
    int e;
    int f;
} B;

void f(B& b1, A& a2) {
    b1.d = a2.a;
    b1.e = a2.b;
    b1.f = a2.c;
}

f could be replaced by a memcpy (especially if the structs had more fields).

Will both versions produce equivalent code?

What if the structure we copy to has fewer fields than A? I.e.

typedef struct C {
    int g;
    int h;
} C;

void h(C& c1, A& a2) {
    c1.g = a2.a;
    c1.h = a2.b;
}

I am interested because I am generating code which includes struct copies like this, normally changing the order of fields, and I want to know if these cases should be treated specially.

C tag included because I expect behavior in C is the same (modulo pointers instead of references).

Upvotes: 5

Answers (3)

Richard Smith

Reputation: 14158

Your testcase does not load and store enough memory for a conversion to memcpy to be worthwhile. Using twice as many members:

typedef struct A { int a, b, c, p, q, r; } A;
typedef struct B { int d, e, f, s, t, u; } B;
void f(B& b1, A& a2) {
  b1.d = a2.a;
  b1.e = a2.b;
  b1.f = a2.c;
  b1.s = a2.p;
  b1.t = a2.q;
  b1.u = a2.r;
}

... LLVM optimizes the code to:

f(B&, A&):                             # @f(B&, A&)
        movups  (%rsi), %xmm0
        movups  %xmm0, (%rdi)
        movl    16(%rsi), %eax
        movl    %eax, 16(%rdi)
        movl    20(%rsi), %eax
        movl    %eax, 20(%rdi)
        retq

... with an unaligned 16-byte load/store copying the first four members.

Upvotes: 3

Trollliar

Reputation: 866

The general answer: depends on. As a free function generally it will generate a code pretty similar to std::memmove (using temporary variables to avoid possible overlaps, see docs), but after inlining it can be folded as std::memcpy with possible optimizations (by SSE for example).

EDIT:

You can see fully optimized output and try to experiment with gcc.godbolt by using volatile variables: such a trick allows to see optimizations but forbid the compiler to omit results inside "battle" code. Take this.

Upvotes: 2

Alexey Romanov

Reputation: 170745

According to godbolt.org, x86-64 gcc 6.2 with -O2 produces

mov eax, DWORD PTR [rsi]
mov DWORD PTR [rdi], eax
mov eax, DWORD PTR [rsi+4]
mov DWORD PTR [rdi+4], eax
mov eax, DWORD PTR [rsi+8]
mov DWORD PTR [rdi+8], eax

for field-by-field copy,

mov rax, QWORD PTR [rsi]
mov QWORD PTR [rdi], rax
mov eax, DWORD PTR [rsi+8]
mov DWORD PTR [rdi+8], eax

for memcpy. Both clang and icc have similar differences. A bit disappointing.

Upvotes: 4

Do GCC and Clang optimize field-by-field struct copy?

Answers (3)

Related Questions