Inline assembly code and storing 128-bit result

Question

I have to write the following statement as inline assembly code in my C program using GCC compiler on Ubuntu.

__int128 X = (__int128)F[0]*T[0]+(__int128)F[1]*T[1]+(__int128)F[2]*T[2]);

Where F is unsigned 64-bit integer array and T is signed 64-bit integer array. The array F is passed by reference as an argument while T is a local array. I have translated the above statement as follows

__asm__("movq %0, %%rax;        imulq  %1;  movq %%rax, %%xmm0; movq %%rdx, %%xmm1;"
        ::"m"(F[0]), "m"(T[0]));
__asm__("movq %0, %%rax;        imulq  %1;  movq %%xmm0, %%rcx; addq %%rcx, %%rax;" 
        "movq %%rax, %%xmm0;                movq %%xmm1, %%rcx; adcq %%rcx, %%rdx;"
        "movq %%rdx, %%xmm1;"
        ::"m"(F[1]), "m"(T[1]));
__asm__("movq %2, %%rax;        imulq  %3;  movq %%xmm0, %%rcx; addq %%rcx, %%rax;" 
        "movq %%rax, %?;                    movq %%xmm1, %%rcx; adcq %%rcx, %%rdx;"
        "movq %%rdx, %?;"
        :"=m"(??), "=m"(??):"m"(F[2]), "m"(T[2]));

The first and foremost question is am I doing it correctly? If yes, then I don't know how to store the result into X because the lower 64-bit of the result is in rax and the higher 64-bit is in rdx. I have checked that if I substitute ?? by X, then I get the wrong result.

The use of xmm registers for storage only has a reason. Since I am naive to inline assembly therefore, I think there are better ways of doing this. I have checked my program with the above inline assembly code and there is no error. Any help or suggestion for improvement will be highly appreciated.

Timothy Baldwin · Accepted Answer

You are sign-extending F. Since there isn't a signed * unsigned multiply instruction, sign extension will have to be done explicitly (a 16-bit to 32-bit example):

(0xFFFF0000 + S) * U
= 0xFFFF0000 * U + S * U
= (0x100000000 - 0x10000) * U + S * U
= 0x100000000 * U - 0x10000 * U + S * U
= S * U - 0x10000 * U  (don't care about high bits)

You can't rely on values remaining in registers between blocks of inline asm statements; you must use variables. All modified registers must be declared as either outputs or clobbers.

For example, a single multiplication of U, a 64 bit unsigned value, and S, a 64-bit signed value:

__int128 X;
uint64_t Utmp = U;
asm ("mov %1, %%rax;"
     "mul %2;"
     "test %2, %2;"
     "cmovns %3, %1;"
     "sub %1, %%rdx"
     : "=&A" (X), "+r" (Utmp) : "r" (S), "rm" (0UL));

Edit: It can be done without a zero input:

int64_t Stmp = S;
asm ("mov %1, %%rax;"
     "mul %2;"
     "sar $63, %1;"
     "and %2, %1;"
     "sub %1, %%rdx"
     : "=&A" (X), "+rm" (Stmp) : "r" (U));

Inline assembly code and storing 128-bit result

Answers (1)

Related Questions