Reputation: 17
I am trying to build a project for Ubuntu 14.04 X86, and I've got the following error:
error: unknown register name ‘%xmm1’ in ‘asm’
asm volatile (
^
error: unknown register name ‘%xmm0’ in ‘asm’
error: unknown register name ‘%mm1’ in ‘asm’
error: unknown register name ‘%mm0’ in ‘asm’
error: unknown register name ‘%xmm0’ in ‘asm’
asm volatile (
^
error: unknown register name ‘%mm0’ in ‘asm’
asm volatile (
in function :
static inline void
hev_bytes_xor_sse (guint8 *data, gsize size, guint8 byte)
{
gsize i = 0, c = 0, p128 = 0, p64 = 0;
guint64 w = (byte << 8) | byte;
asm volatile (
"movq %0, %%mm0\t\n"
"pshufw $0x00, %%mm0, %%mm1\t\n"
"movq2dq %%mm1, %%xmm0\t\n"
"pshufd $0x00, %%xmm0, %%xmm1\t\n"
::"m"(w)
:"%mm0", "%mm1", "%xmm0", "%xmm1"
);
GCC version 4.8.2
Upvotes: 2
Views: 4514
Reputation: 365312
GCC (since at least 4.0) doesn't allow clobbers for registers it's not allowed to touch with the current target settings.
Many distros configure GCC so the -m32
default is i686 (-march=pentiumpro
allowing cmov
), so -m32
implies -mno-sse
. In kernel code, -mgeneral-regs-only
would also be a problem even with -m64
.
GCC3.4.6 on Godbolt does compile this even with -mno-sse
GCC10 and later improve the message to
error: the register 'xmm1' cannot be clobbered in 'asm' for the current target
You could use #ifdef __SSE__
around the XMM clobbers.
Or tell GCC it is allowed to touch XMM registers with -msse
(or even -msse2
since you're using SSE2 instructions anyway). Or just in that function with __attribute__((target("sse2")))
but that will block inlining into callers with different target settings.
Omitting the clobbers and using registers you haven't told the compiler about feels wrong, but as long as you're only doing that when the compiler definitely won't be using those registers, you won't be stepping on its toes. XMM registers were new with SSE1, so that's what matters for an ifdef
.
But if you are telling the compiler it can use XMM registers, you might as well let it use the same SSE feature level your inline asm does, SSE2 in this case.
In this case it doesn't need to generate any SSE instructions before or after your asm
statement because all vector regs are call-clobbered in i386 and AMD64 System V calling conventions. But for example Windows x64 has XMM6-15 call-preserved.
An XMM6 clobber on Win x64 would require the compiler to save/restore it follow the calling convention, but -mgeneral-regs-only
or -mno-sse
wouldn't let it use any instructions that can do that. I'm guessing things like this on various architectures are why GCC added checks on which registers you can clobber. It would be nice if the compiler internals were smart enough to check the specific register against the calling convention and only error if it was call-preserved.
Your current code would need emms
before x87 FP math will work again, but it seems GCC doesn't do that for you after code that clobbers MMX regs, unlike if you used MMX intrinsics. (In this case you should use pshuflw
in an XMM reg, or an integer multiply by 0x01010101
, to avoid MMX entirely.)
// one-off usage:
asm("..."
: outputs
: inputs
: // clobbers
#ifdef __MMX__
"mm0", "mm1"
#endif
#ifdef __SSE__
,"xmm0", "xmm1" // leading/trailing comma not allowed, but SSE implies MMX
#endif
);
Or with macros to reduce noise if you have multiple asm
statements:
#ifdef __SSE__
#define XMM_CLOBBERS(...) __VA_ARGS__
#else
#define XMM_CLOBBERS(...) /* empty */
#endif
// and same for MMX, but you don't need MMX for this.
asm("..."
: outputs
: inputs
: MMX_CLOBBERS("mm0", "mm1") XMM_CLOBBERS(, "xmm0", "xmm1")
);
Unsolved problem: how to avoid leading or trailing commas in the list? I guess MMX_CLOBBERS("mm0", "mm1") XMM_CLOBBERS(, "xmm0", "xmm1")
since again, you can't have SSE without MMX. (SSE1 includes some new instructions on MMX registers, like pshufw
.) For a case with two independent ISA extensions like MPX (bnd
registers) and SSE, you might need a macro like MPX_AND_SSE_COMMA
which is defined as ,
only if both are defined? Seems like a mess. Fortunately most extensions that add new registers imply previous extensions. And for AVX I think clobbering XMM0 is sufficient for YMM0; the compiler won't assume the high half of YMM0 is still unmodified, and won't pick YMM0 for any input or output operands. If GCC allowed trailing commas in clobber or operand lists, this would be a lot simpler.
uint32_t dw = 0x01010101u * byte; // broadcast to dword
asm volatile (
"movd %0, %%mm0\t\n"
"pshufd $0x00, %%xmm0, %%xmm1\t\n"
::"r"(dw)
: "memory" // assuming data is an input to the real asm and you deref it.
XMM_CLOBBER(, "xmm0", "xmm1")
// note the leading comma inside this, to separate from "memory"
);
(Godbolt showing it compiling with -mno-sse
and with -m64
)
Or with SSSE3, pxor %%xmm1, %%xmm1
to broadcast byte #0 in-place with pshufb %%xmm1, %%xmm0
. If SSE2 was enabled in the compiler, you could use "x"(0)
to ask the compiler for a zeroed register that could be reused across asm statements without having to re-run the pxor
.
Or of course use intrinsics and let the compiler use these tricks for you with _mm_set1_epi8(byte)
, with -msse2
or -mssse3
enabled. (Or -march=x86-64-v2
for SSE4.2 + popcnt is probably good these days for CPUs without AVX)
Upvotes: 0
Reputation: 8657
You can check what target-specific options gcc
uses, using:
gcc -Q --help=target
. gcc -O3 -Q --help=target
does the same but for -O3
.
In order to compile this code, you need to have -mmmx
and -msse2
enabled. If that's not the case, you can just pass them over the command line: gcc -mmmx -msse2 ...
.
Upvotes: 2