Reputation: 163
I have to write the following statement as inline assembly code in my C program using GCC compiler on Ubuntu.
__int128 X = (__int128)F[0]*T[0]+(__int128)F[1]*T[1]+(__int128)F[2]*T[2]);
Where F is unsigned 64-bit integer array and T is signed 64-bit integer array. The array F is passed by reference as an argument while T is a local array. I have translated the above statement as follows
__asm__("movq %0, %%rax; imulq %1; movq %%rax, %%xmm0; movq %%rdx, %%xmm1;"
::"m"(F[0]), "m"(T[0]));
__asm__("movq %0, %%rax; imulq %1; movq %%xmm0, %%rcx; addq %%rcx, %%rax;"
"movq %%rax, %%xmm0; movq %%xmm1, %%rcx; adcq %%rcx, %%rdx;"
"movq %%rdx, %%xmm1;"
::"m"(F[1]), "m"(T[1]));
__asm__("movq %2, %%rax; imulq %3; movq %%xmm0, %%rcx; addq %%rcx, %%rax;"
"movq %%rax, %?; movq %%xmm1, %%rcx; adcq %%rcx, %%rdx;"
"movq %%rdx, %?;"
:"=m"(??), "=m"(??):"m"(F[2]), "m"(T[2]));
The first and foremost question is am I doing it correctly? If yes, then I don't know how to store the result into X because the lower 64-bit of the result is in rax and the higher 64-bit is in rdx. I have checked that if I substitute ?? by X, then I get the wrong result.
The use of xmm registers for storage only has a reason. Since I am naive to inline assembly therefore, I think there are better ways of doing this. I have checked my program with the above inline assembly code and there is no error. Any help or suggestion for improvement will be highly appreciated.
Upvotes: 3
Views: 886
Reputation: 3675
You are sign-extending F. Since there isn't a signed * unsigned multiply instruction, sign extension will have to be done explicitly (a 16-bit to 32-bit example):
(0xFFFF0000 + S) * U
= 0xFFFF0000 * U + S * U
= (0x100000000 - 0x10000) * U + S * U
= 0x100000000 * U - 0x10000 * U + S * U
= S * U - 0x10000 * U (don't care about high bits)
You can't rely on values remaining in registers between blocks of inline asm statements; you must use variables. All modified registers must be declared as either outputs or clobbers.
For example, a single multiplication of U
, a 64 bit unsigned value, and S
, a 64-bit signed value:
__int128 X;
uint64_t Utmp = U;
asm ("mov %1, %%rax;"
"mul %2;"
"test %2, %2;"
"cmovns %3, %1;"
"sub %1, %%rdx"
: "=&A" (X), "+r" (Utmp) : "r" (S), "rm" (0UL));
Edit: It can be done without a zero input:
int64_t Stmp = S;
asm ("mov %1, %%rax;"
"mul %2;"
"sar $63, %1;"
"and %2, %1;"
"sub %1, %%rdx"
: "=&A" (X), "+rm" (Stmp) : "r" (U));
Upvotes: 4