Reputation: 942
In an attempt to avoid breaking strict aliasing rules, I introduced memcpy
to a couple places in my code expecting it to be a no-op. The following example produces a call to memcpy
(or equivalent) on gcc and clang. Specifically, fool<40>
always does while foo
does on gcc but not clang and fool<2>
does on clang but not gcc. When / how can this be optimized away?
uint64_t bar(const uint16_t *buf) {
uint64_t num[2];
memcpy(&num, buf, 16);
return num[0] + num[1];
}
uint64_t foo(const uint16_t *buf) {
uint64_t num[3];
memcpy(&num, buf, sizeof(num));
return num[0] + num[1];
}
template <int SZ>
uint64_t fool(const uint16_t *buf) {
uint64_t num[SZ];
memcpy(&num, buf, sizeof(num));
uint64_t ret = 0;
for (int i = 0; i < SZ; ++i)
ret += num[i];
return ret;
}
template uint64_t fool<2>(const uint16_t*);
template uint64_t fool<40>(const uint16_t*);
And a link to the compiled output (godbolt).
Upvotes: 4
Views: 1369
Reputation: 15933
I can't really tell you why exactly the respective compilers fail to optimize the code in the way you'd hope them to optimize in the specific cases. I guess each compiler is either just unable to track the relationship established by memcpy between the target array and the source memory (as we can see, they do seem to recognize this relationship at least in some cases), or they simply have some heuristic tell them to chose not to make use of it.
Anyways, since compilers seem to not behave as we would hope when we rely on them tracking the entire array, what we can try to do is to make it more obvious to the compiler by just doing the memcpy on an element-per-element basis. This seems to produce the desired result on both compilers. Note that I had to manually unroll the initialization in bar
and foo
as clang will otherwise do a copy again.
Apart from that, note that in C++ you should use std::memcpy
, std::uint64_t
, etc. since the standard headers are not guaranteed to also introduce these names into the global namespace (though I'm not aware of any implementation that doesn't do that).
Upvotes: 2