Philipp Claßen
Philipp Claßen

Reputation: 44009

gcc: avoiding strict-aliasing violation warning by explicit memcpy

I have a class that takes 64 bit in memory. To implement equality, I used reinterpret_cast<uint64_t*>, but it results in this warning on gcc 7.2 (but not clang 5.0):

$ g++ -O3 -Wall -std=c++17 -g -c example.cpp 
example.cpp: In member function ‘bool X::eq_via_cast(X)’:
example.cpp:27:85: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
     return *reinterpret_cast<uint64_t*>(this) == *reinterpret_cast<uint64_t*>(&x);                                                                                     ^

From my understanding, casting is undefined behavior unless you are casting to the actual type or to char*. For instance, there could be architecture specific layout restricts while loading values. That is why I tried alternative approaches.

Here is the source code of a simplified version (link to godbolt):

#include <cstdint>
#include <cstring>

struct Y
{
    uint32_t x;
    bool operator==(Y y) { return x == y.x; }
};

struct X
{
    Y a;
    int16_t b;
    int16_t c;

    uint64_t to_uint64() {
        uint64_t result;
        std::memcpy(&result, this, sizeof(uint64_t));
        return result;
    }

    bool eq_via_memcpy(X x) {
        return to_uint64() == x.to_uint64();
    }

    bool eq_via_cast(X x) {
        return *reinterpret_cast<uint64_t*>(this) == *reinterpret_cast<uint64_t*>(&x);
    }

    bool eq_via_comparisons(X x) {
        return a == x.a && b == x.b && c == x.c;
    }
};
static_assert(sizeof(X) == sizeof(uint64_t));

bool via_memcpy(X x1, X x2) {
    return x1.eq_via_memcpy(x2);
}

bool via_cast(X x1, X x2) {
    return x1.eq_via_cast(x2);
}

bool via_comparisons(X x1, X x2) {
    return x1.eq_via_comparisons(x2);
}

Avoiding the cast by explicitly copying the data via memcpy prevents the warning. As far as I understand it, it should also be portable.

Looking at the assembler (gcc 7.2 with -std=c++17 -O3), memcpy is optimized perfectly while the straightforward comparisons lead to less efficient code:

via_memcpy(X, X):
  cmp rdi, rsi
  sete al
  ret

via_cast(X, X):
  cmp rdi, rsi
  sete al
  ret

via_comparisons(X, X):
  xor eax, eax
  cmp esi, edi
  je .L7
  rep ret
.L7:
  sar rdi, 32
  sar rsi, 32
  cmp edi, esi
  sete al
  ret

Very similar with clang 5.0 (-std=c++17 -O3):

via_memcpy(X, X): # @via_memcpy(X, X)
  cmp rdi, rsi
  sete al
  ret

via_cast(X, X): # @via_cast(X, X)
  cmp rdi, rsi
  sete al
  ret

via_comparisons(X, X): # @via_comparisons(X, X)
  cmp edi, esi
  jne .LBB2_1
  mov rax, rdi
  shr rax, 32
  mov rcx, rsi
  shr rcx, 32
  shl eax, 16
  shl ecx, 16
  cmp ecx, eax
  jne .LBB2_3
  shr rdi, 48
  shr rsi, 48
  shl edi, 16
  shl esi, 16
  cmp esi, edi
  sete al
  ret
.LBB2_1:
  xor eax, eax
  ret
.LBB2_3:
  xor eax, eax
  ret

From this experiment, it looks like the memcpy version is the best approach in performance critical parts of the code.

Questions:

Update:

As UKMonkey pointed out, memcmp is more natural when doing bitwise comparisons. It also compiles down to the same optimized version:

bool eq_via_memcmp(X x) {
    return std::memcmp(this, &x, sizeof(*this)) == 0;
}

Here is the updated godbolt link. Should also be portable (sizeof(*this) is 64 bit), so I assume it is the best solution so far.

Upvotes: 4

Views: 503

Answers (1)

Philipp Cla&#223;en
Philipp Cla&#223;en

Reputation: 44009

In C++17, memcmp in combination with has_unique_object_representations can be used:

bool eq_via_memcmp(X x) {
    static_assert(std::has_unique_object_representations_v<X>);
    return std::memcmp(this, &x, sizeof(*this)) == 0;
}

Compilers should be able to optimize it to one comparison (godbolt link):

via_memcmp(X, X):
  cmp rdi, rsi
  sete al
  ret

The static assertion makes sure that the class X does not contain padding bits. Otherwise, comparing two logically equivalent objects could return false because the content of the padding bits may differ. In that case, it is safer to reject that code at compile time.

(Note: Presumably, C++20 will add std::bit_cast, which could be used as an alternative for memcmp. But still, you have to make sure that no padding is involved for the same reason.)

Upvotes: 2

Related Questions