alibag90
alibag90

Reputation: 17

How do I "fix unknown register name ‘%xmm1’ in ‘asm’" from the clobber list?

I am trying to build a project for Ubuntu 14.04 X86, and I've got the following error:

error: unknown register name ‘%xmm1’ in ‘asm’
         asm volatile (
         ^
   error: unknown register name ‘%xmm0’ in ‘asm’
   error: unknown register name ‘%mm1’ in ‘asm’
   error: unknown register name ‘%mm0’ in ‘asm’
   error: unknown register name ‘%xmm0’ in ‘asm’
         asm volatile (
             ^
   error: unknown register name ‘%mm0’ in ‘asm’
          asm volatile (

in function :

static inline void
hev_bytes_xor_sse (guint8 *data, gsize size, guint8 byte)
{
    gsize i = 0, c = 0, p128 = 0, p64 = 0;
    guint64 w = (byte << 8) | byte;

    asm volatile (
        "movq %0, %%mm0\t\n"
        "pshufw $0x00, %%mm0, %%mm1\t\n"
        "movq2dq %%mm1, %%xmm0\t\n"
        "pshufd $0x00, %%xmm0, %%xmm1\t\n"
        ::"m"(w)
        :"%mm0", "%mm1", "%xmm0", "%xmm1"
    );

GCC version 4.8.2

Upvotes: 2

Views: 4514

Answers (2)

Peter Cordes
Peter Cordes

Reputation: 365312

GCC (since at least 4.0) doesn't allow clobbers for registers it's not allowed to touch with the current target settings.
Many distros configure GCC so the -m32 default is i686 (-march=pentiumpro allowing cmov), so -m32 implies -mno-sse. In kernel code, -mgeneral-regs-only would also be a problem even with -m64.

GCC3.4.6 on Godbolt does compile this even with -mno-sse
GCC10 and later improve the message to
error: the register 'xmm1' cannot be clobbered in 'asm' for the current target

You could use #ifdef __SSE__ around the XMM clobbers.

Or tell GCC it is allowed to touch XMM registers with -msse (or even -msse2 since you're using SSE2 instructions anyway). Or just in that function with __attribute__((target("sse2"))) but that will block inlining into callers with different target settings.

Omitting the clobbers and using registers you haven't told the compiler about feels wrong, but as long as you're only doing that when the compiler definitely won't be using those registers, you won't be stepping on its toes. XMM registers were new with SSE1, so that's what matters for an ifdef.
But if you are telling the compiler it can use XMM registers, you might as well let it use the same SSE feature level your inline asm does, SSE2 in this case.

In this case it doesn't need to generate any SSE instructions before or after your asm statement because all vector regs are call-clobbered in i386 and AMD64 System V calling conventions. But for example Windows x64 has XMM6-15 call-preserved.
An XMM6 clobber on Win x64 would require the compiler to save/restore it follow the calling convention, but -mgeneral-regs-only or -mno-sse wouldn't let it use any instructions that can do that. I'm guessing things like this on various architectures are why GCC added checks on which registers you can clobber. It would be nice if the compiler internals were smart enough to check the specific register against the calling convention and only error if it was call-preserved.

Your current code would need emms before x87 FP math will work again, but it seems GCC doesn't do that for you after code that clobbers MMX regs, unlike if you used MMX intrinsics. (In this case you should use pshuflw in an XMM reg, or an integer multiply by 0x01010101, to avoid MMX entirely.)

// one-off usage:
asm("..."
  : outputs
  : inputs
  :  // clobbers
#ifdef __MMX__
    "mm0", "mm1"
#endif
#ifdef __SSE__
    ,"xmm0", "xmm1"  // leading/trailing comma not allowed, but SSE implies MMX
#endif
  );

Or with macros to reduce noise if you have multiple asm statements:


#ifdef __SSE__
#define XMM_CLOBBERS(...) __VA_ARGS__
#else
#define XMM_CLOBBERS(...)  /* empty */
#endif
// and same for MMX, but you don't need MMX for this.

asm("..."
  : outputs
  : inputs
  : MMX_CLOBBERS("mm0", "mm1")  XMM_CLOBBERS(, "xmm0", "xmm1")
  );

Unsolved problem: how to avoid leading or trailing commas in the list? I guess MMX_CLOBBERS("mm0", "mm1") XMM_CLOBBERS(, "xmm0", "xmm1") since again, you can't have SSE without MMX. (SSE1 includes some new instructions on MMX registers, like pshufw.) For a case with two independent ISA extensions like MPX (bnd registers) and SSE, you might need a macro like MPX_AND_SSE_COMMA which is defined as , only if both are defined? Seems like a mess. Fortunately most extensions that add new registers imply previous extensions. And for AVX I think clobbering XMM0 is sufficient for YMM0; the compiler won't assume the high half of YMM0 is still unmodified, and won't pick YMM0 for any input or output operands. If GCC allowed trailing commas in clobber or operand lists, this would be a lot simpler.


You don't need MMX to broadcast a byte

    uint32_t dw = 0x01010101u * byte;  // broadcast to dword
   
    asm volatile (
        "movd    %0, %%mm0\t\n"
        "pshufd  $0x00, %%xmm0, %%xmm1\t\n"
        ::"r"(dw)
        : "memory"  // assuming  data  is an input to the real asm and you deref it.
          XMM_CLOBBER(, "xmm0", "xmm1") 
              // note the leading comma inside this, to separate from "memory"
    );

(Godbolt showing it compiling with -mno-sse and with -m64)

Or with SSSE3, pxor %%xmm1, %%xmm1 to broadcast byte #0 in-place with pshufb %%xmm1, %%xmm0. If SSE2 was enabled in the compiler, you could use "x"(0) to ask the compiler for a zeroed register that could be reused across asm statements without having to re-run the pxor.

Or of course use intrinsics and let the compiler use these tricks for you with _mm_set1_epi8(byte), with -msse2 or -mssse3 enabled. (Or -march=x86-64-v2 for SSE4.2 + popcnt is probably good these days for CPUs without AVX)

Upvotes: 0

a3f
a3f

Reputation: 8657

You can check what target-specific options gcc uses, using: gcc -Q --help=target. gcc -O3 -Q --help=target does the same but for -O3.

In order to compile this code, you need to have -mmmx and -msse2 enabled. If that's not the case, you can just pass them over the command line: gcc -mmmx -msse2 ....

Upvotes: 2

Related Questions