Reputation: 121
I tried to compile this overflow detection macro of Zend engine:
#define ZEND_SIGNED_MULTIPLY_LONG(a, b, lval, dval, usedval) do { \
long __tmpvar; \
__asm__( \
"mul %0, %2, %3\n" \
"smulh %1, %2, %3\n" \
"sub %1, %1, %0, asr #63\n" \
: "=X"(__tmpvar), "=X"(usedval) \
: "X"(a), "X"(b)); \
if (usedval) (dval) = (double) (a) * (double) (b); \
else (lval) = __tmpvar; \
} while (0)
And got this result in assembly:
; InlineAsm Start
mul x8, x8, x9
smulh x9, x8, x9
sub x9, x9, x8, asr #63
; InlineAsm End
The compiler used only 2 register for both input and output of the macro, which i think it must be at least 3, and lead to wrong result of the calculation (for example, -1 * -1). Any suggestion?
Upvotes: 2
Views: 471
Reputation: 71889
The assembly code is buggy. From GCC's documentation on extended asm:
Use the ‘&’ constraint modifier (see Modifiers) on all output operands that must not overlap an input. Otherwise, GCC may allocate the output operand in the same register as an unrelated input operand, on the assumption that the assembler code consumes its inputs before producing outputs. This assumption may be false if the assembler code actually consists of more than one instruction.
This basically says that from the moment you write to an output parameter not marked with an ampersand, you're not allowed to use the input parameters anymore because they might have been overwritten.
Upvotes: 5
Reputation: 363882
The syntax is designed around the concept of wrapping a single insn which reads its inputs before writing its outputs.
When you use multiple insns, you often need to use an early-clobber modifier on the constraint ("=&x"
) to let the compiler know you write an output or read-write register before reading all the inputs. Then it will make sure that output register isn't the same register as any of the input registers.
See also the x86 tag wiki, and my collection of inline asm docs and SO answers at the bottom of this answer.
#define ZEND_SIGNED_MULTIPLY_LONG(a, b, lval, dval, usedval) do { \
long __tmpvar; \
__asm__( \
"mul %[tmp], %[a], %[b]\n\t" \
"smulh %[uv], %[a], %[b]\n\t" \
"sub %[uv], %[uv], %[tmp], asr #63\n" \
: [tmp] "=&X"(__tmpvar), [uv] "=&X"(usedval) \
: [a] "X"(a), [b] "X"(b)); \
if (usedval) (dval) = (double) (a) * (double) (b); \
else (lval) = __tmpvar; \
} while (0)
Do you really need all those instructions to be inside the inline asm? Can't you make long tmp = a * b
an input operand? Then if the compiler needs a*b
elsewhere in the function, CSE can see it.
You can convince gcc to broadcast the sign bit with an arithmetic right shift using a ternary operator. So hopefully you can coax the compiler to do the sub
that way. Then it could use subs
to set flags from the sub
instead of needing a separate test insn on usedval
.
If you can't get your target compiler to make the code you want, then sure, give inline asm a shot. But beware, I've seen clang be a lot worse than gcc with inline asm. It tends to make worse code around the inline asm on x86.
Upvotes: 4