Albert Netymk
Albert Netymk

Reputation: 1110

byte order in xmm clang assembly comments

Given the following program:

#include "emmintrin.h"

int main(int argc, char *argv[])
{
    volatile __m128i x = _mm_set_epi64x(1, 0);
    return 0;
}

I can get the assembly using clang -O -S test.c (only listing the interesting part):

...
movl    $1, %eax
movd    %rax, %xmm0
pslldq  $8, %xmm0               # xmm0 = zero,zero,zero,zero,zero,zero,zero,zero,xmm0[0,1,2,3,4,5,6,7]
...

According to the manual of _mm_set_epi64x, %xmm0 should be [0, 1, 0, 0], with each element being an integer (32 bits).

However, according to the comment, %xmm0 holds [0, 0, 0, 1]. I don't think endianness is relevant here, for I am only looking at a register.

I suspect that it's sth related to the notation used by clang assembly comment, but I can't find any useful info on it on the internet.

== Edit:

Filed a bug to clang.

Upvotes: 1

Views: 172

Answers (2)

Peter Cordes
Peter Cordes

Reputation: 365332

The comment appears to be describing the operation of pslldq in terms of the previous contents of xmm0 (even though those are known at compile time).

It seems to be in reverse order from the usual high-element-first ([ 3 2 1 0 ]) that _mm_set uses, and that makes "left" shifts make sense.

It's the byte-order you'd get in memory if you stored the vector.

I forget if that's typical for clang, and I don't have time right now to check another example.

Upvotes: 1

fuz
fuz

Reputation: 93117

The clang code loads the value in two steps. First, the value 1 is loaded into the lower 64 bits of the register. Then the entire thing is left shifted by 8 binary places so the value 1 ends up in the high 64 bits just as your code specifies.

Upvotes: 1

Related Questions