Reputation: 3171
Consider the following C functions:
int clear_byte0(int x)
{
return x & 0xFFFFFF00;
}
int clear_byte1(int x)
{
return x & 0xFFFF00FF;
}
int clear_bytes01(int x)
{
return x & 0xFFFF0000;
}
They use the bitwise-and operator to clear out specific bits of an integer value.
The x86 architecture has partial registers, which can be zeroed to achieve the same purpose (e.g. clear_byte0
can be implemented by zeroing al
, assuming that x
is stored in eax
). However, it may not always be optimal to do so. Compilers disagree on the most optimal way to implement these functions, as shown in this godbolt example (GCC, clang and MSVC).
Clang and MSVC don't use partial registers at all, and instead mimic the C code, using an and
instruction. They do differ in that clang first moves to eax
, then and
s, while MSVC first and
s and then moves to eax
(does it make a difference?).
GCC chooses to zero the partial registers for all 3 of the functions, by xor
-ing al
, ah
and ax
, respectively. Is this better or worse than using and
? Can it cause partial register stalls, or the insertion of additional merging uops later on?
Another peculiarity in GCC's behavior is the way it zeroes the "high 8" register (ah
). In most cases, it uses xor ah, ah
which is consistent with everything else. However, for some reason, when the source register is ecx
or edx
(probably ebx
too), meaning it has access to ch
/dh
, it insists on using mov eax, ecx
-xor ah, ch
(instead of xor ah, ah
). Does this mean that xor reg8, reg8
has a penalty if the source and destination are the same?
Which one of these ways is the most optimal (and for which architectures)?
Upvotes: 3
Views: 138