Reputation: 44009
I have a class that takes 64 bit in memory. To implement equality, I used reinterpret_cast<uint64_t*>
, but it results in this warning on gcc 7.2 (but not clang 5.0):
$ g++ -O3 -Wall -std=c++17 -g -c example.cpp
example.cpp: In member function ‘bool X::eq_via_cast(X)’:
example.cpp:27:85: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
return *reinterpret_cast<uint64_t*>(this) == *reinterpret_cast<uint64_t*>(&x); ^
From my understanding, casting is undefined behavior unless you are casting to the actual type or to char*
. For instance, there could be architecture specific layout restricts while loading values. That is why I tried alternative approaches.
Here is the source code of a simplified version (link to godbolt):
#include <cstdint>
#include <cstring>
struct Y
{
uint32_t x;
bool operator==(Y y) { return x == y.x; }
};
struct X
{
Y a;
int16_t b;
int16_t c;
uint64_t to_uint64() {
uint64_t result;
std::memcpy(&result, this, sizeof(uint64_t));
return result;
}
bool eq_via_memcpy(X x) {
return to_uint64() == x.to_uint64();
}
bool eq_via_cast(X x) {
return *reinterpret_cast<uint64_t*>(this) == *reinterpret_cast<uint64_t*>(&x);
}
bool eq_via_comparisons(X x) {
return a == x.a && b == x.b && c == x.c;
}
};
static_assert(sizeof(X) == sizeof(uint64_t));
bool via_memcpy(X x1, X x2) {
return x1.eq_via_memcpy(x2);
}
bool via_cast(X x1, X x2) {
return x1.eq_via_cast(x2);
}
bool via_comparisons(X x1, X x2) {
return x1.eq_via_comparisons(x2);
}
Avoiding the cast by explicitly copying the data via memcpy
prevents the warning. As far as I understand it, it should also be portable.
Looking at the assembler (gcc 7.2 with -std=c++17 -O3
), memcpy is optimized perfectly while the straightforward comparisons lead to less efficient code:
via_memcpy(X, X):
cmp rdi, rsi
sete al
ret
via_cast(X, X):
cmp rdi, rsi
sete al
ret
via_comparisons(X, X):
xor eax, eax
cmp esi, edi
je .L7
rep ret
.L7:
sar rdi, 32
sar rsi, 32
cmp edi, esi
sete al
ret
Very similar with clang 5.0 (-std=c++17 -O3
):
via_memcpy(X, X): # @via_memcpy(X, X)
cmp rdi, rsi
sete al
ret
via_cast(X, X): # @via_cast(X, X)
cmp rdi, rsi
sete al
ret
via_comparisons(X, X): # @via_comparisons(X, X)
cmp edi, esi
jne .LBB2_1
mov rax, rdi
shr rax, 32
mov rcx, rsi
shr rcx, 32
shl eax, 16
shl ecx, 16
cmp ecx, eax
jne .LBB2_3
shr rdi, 48
shr rsi, 48
shl edi, 16
shl esi, 16
cmp esi, edi
sete al
ret
.LBB2_1:
xor eax, eax
ret
.LBB2_3:
xor eax, eax
ret
From this experiment, it looks like the memcpy
version is the best approach in performance critical parts of the code.
Questions:
memcpy
version is portable C++ code?memcpy
call like in this example? Update:
As UKMonkey pointed out, memcmp
is more natural when doing bitwise comparisons. It also compiles down to the same optimized version:
bool eq_via_memcmp(X x) {
return std::memcmp(this, &x, sizeof(*this)) == 0;
}
Here is the updated godbolt link. Should also be portable (sizeof(*this)
is 64 bit), so I assume it is the best solution so far.
Upvotes: 4
Views: 503
Reputation: 44009
In C++17, memcmp in combination with has_unique_object_representations can be used:
bool eq_via_memcmp(X x) {
static_assert(std::has_unique_object_representations_v<X>);
return std::memcmp(this, &x, sizeof(*this)) == 0;
}
Compilers should be able to optimize it to one comparison (godbolt link):
via_memcmp(X, X):
cmp rdi, rsi
sete al
ret
The static assertion makes sure that the class X
does not contain padding bits. Otherwise, comparing two logically equivalent objects could return false because the content of the padding bits may differ. In that case, it is safer to reject that code at compile time.
(Note: Presumably, C++20 will add std::bit_cast, which could be used as an alternative for memcmp
. But still, you have to make sure that no padding is involved for the same reason.)
Upvotes: 2