Alexey Romanov
Alexey Romanov

Reputation: 170745

Do GCC and Clang optimize field-by-field struct copy?

E.g. given

typedef struct A {
    int a;
    int b;
    int c;
} A;

typedef struct B {
    int d;
    int e;
    int f;
} B;

void f(B& b1, A& a2) {
    b1.d = a2.a;
    b1.e = a2.b;
    b1.f = a2.c;
}

f could be replaced by a memcpy (especially if the structs had more fields).

  1. Will both versions produce equivalent code?

  2. What if the structure we copy to has fewer fields than A? I.e.

    typedef struct C {
        int g;
        int h;
    } C;
    
    void h(C& c1, A& a2) {
        c1.g = a2.a;
        c1.h = a2.b;
    }
    

I am interested because I am generating code which includes struct copies like this, normally changing the order of fields, and I want to know if these cases should be treated specially.

C tag included because I expect behavior in C is the same (modulo pointers instead of references).

Upvotes: 5

Views: 503

Answers (3)

Richard Smith
Richard Smith

Reputation: 14158

Your testcase does not load and store enough memory for a conversion to memcpy to be worthwhile. Using twice as many members:

typedef struct A { int a, b, c, p, q, r; } A;
typedef struct B { int d, e, f, s, t, u; } B;
void f(B& b1, A& a2) {
  b1.d = a2.a;
  b1.e = a2.b;
  b1.f = a2.c;
  b1.s = a2.p;
  b1.t = a2.q;
  b1.u = a2.r;
}

... LLVM optimizes the code to:

f(B&, A&):                             # @f(B&, A&)
        movups  (%rsi), %xmm0
        movups  %xmm0, (%rdi)
        movl    16(%rsi), %eax
        movl    %eax, 16(%rdi)
        movl    20(%rsi), %eax
        movl    %eax, 20(%rdi)
        retq

... with an unaligned 16-byte load/store copying the first four members.

Upvotes: 3

Trollliar
Trollliar

Reputation: 866

The general answer: depends on. As a free function generally it will generate a code pretty similar to std::memmove (using temporary variables to avoid possible overlaps, see docs), but after inlining it can be folded as std::memcpy with possible optimizations (by SSE for example).

EDIT:

You can see fully optimized output and try to experiment with gcc.godbolt by using volatile variables: such a trick allows to see optimizations but forbid the compiler to omit results inside "battle" code. Take this.

Upvotes: 2

Alexey Romanov
Alexey Romanov

Reputation: 170745

According to godbolt.org, x86-64 gcc 6.2 with -O2 produces

mov eax, DWORD PTR [rsi]
mov DWORD PTR [rdi], eax
mov eax, DWORD PTR [rsi+4]
mov DWORD PTR [rdi+4], eax
mov eax, DWORD PTR [rsi+8]
mov DWORD PTR [rdi+8], eax

for field-by-field copy,

mov rax, QWORD PTR [rsi]
mov QWORD PTR [rdi], rax
mov eax, DWORD PTR [rsi+8]
mov DWORD PTR [rdi+8], eax

for memcpy. Both clang and icc have similar differences. A bit disappointing.

Upvotes: 4

Related Questions