DarkAtom
DarkAtom

Reputation: 3171

How to zero certain bytes of a register?

Consider the following C functions:

int clear_byte0(int x)
{
    return x & 0xFFFFFF00;
}

int clear_byte1(int x)
{
    return x & 0xFFFF00FF;
}

int clear_bytes01(int x)
{
    return x & 0xFFFF0000;
}

They use the bitwise-and operator to clear out specific bits of an integer value.

The x86 architecture has partial registers, which can be zeroed to achieve the same purpose (e.g. clear_byte0 can be implemented by zeroing al, assuming that x is stored in eax). However, it may not always be optimal to do so. Compilers disagree on the most optimal way to implement these functions, as shown in this godbolt example (GCC, clang and MSVC).

Clang and MSVC don't use partial registers at all, and instead mimic the C code, using an and instruction. They do differ in that clang first moves to eax, then ands, while MSVC first ands and then moves to eax (does it make a difference?).

GCC chooses to zero the partial registers for all 3 of the functions, by xor-ing al, ah and ax, respectively. Is this better or worse than using and? Can it cause partial register stalls, or the insertion of additional merging uops later on?

Another peculiarity in GCC's behavior is the way it zeroes the "high 8" register (ah). In most cases, it uses xor ah, ah which is consistent with everything else. However, for some reason, when the source register is ecx or edx (probably ebx too), meaning it has access to ch/dh, it insists on using mov eax, ecx-xor ah, ch (instead of xor ah, ah). Does this mean that xor reg8, reg8 has a penalty if the source and destination are the same?

Which one of these ways is the most optimal (and for which architectures)?

Upvotes: 3

Views: 138

Answers (0)

Related Questions