Reputation: 123
The following code computes the product of x and y and stores the result in memory. Data type ll_t is defined to be equivalent to long long.
typedef long long ll_t;
void store_prod(ll_t *dest, int x, ll_t y) {
*dest = x*y;
}
gcc generates the following assembly code implementing the computation: dest at %ebp+8, x at %ebp+12, y at %ebp+16
1 movl 16(%ebp), %esi
2 movl 12(%ebp), %eax
3 movl %eax, %edx
4 sarl $31, %edx
5 movl 20(%ebp), %ecx
6 imull %eax, %ecx
7 movl %edx, %ebx
8 imull %esi, %ebx
9 addl %ebx, %ecx
10 mull %esi
11 leal (%ecx,%edx), %edx
12 movl 8(%ebp), %ecx
13 movl %eax, (%ecx)
14 movl %edx, 4(%ecx)
This code uses three multiplications to implement the multiprecision arithmetic required to implement 64-bit arithmetic on a 32-bit machine. Describe the algorithm used to compute the product, and annotate the assembly code to show how it realizes your algorithm.
I don't understand line 8 and line 9 in assembly code above. Can anyone help?
Upvotes: 5
Views: 7455
Reputation: 62106
I've converted it to intel syntax.
mov esi, y_low
mov eax, x
mov edx, eax
sar edx, 31
mov ecx, y_high
imul ecx, eax ; ecx = y_high *{signed} x
mov ebx, edx
imul ebx, esi ; ebx = sign_extension(x) *{signed} y_low
add ecx, ebx ; ecx = y_high *{signed} x_low + x_high *{signed} y_low
mul esi ; edx:eax = x_low *{unsigned} y_low
lea edx, [ecx + edx] ; edx = high(x_low *{unsigned} y_low + y_high *{signed} x_low + x_high *{signed} y_low)
mov ecx, dest
mov [ecx], eax
mov [ecx + 4], edx
What the above code does is multiplication of 2 64-bit signed integers that keeps the least-significant 64 bits of the product.
Where does the other 64-bit multiplicand come from? It's x
sign-extended from 32 bits to 64. The sar
instruction is used to replicate x's
sign bit into all bits of edx
. I call this value consisting only of the x's
sign x_high
. x_low
is the value of x
actually passed into the routine.
y_low
and y_high
are the least and most significant parts of y
, just like x's
x_low
and x_high
are.
From here it's pretty easy:
product = y
*{signed} x
=
(y_high
* 232 + y_low
) *{signed} (x_high
* 232 + x_low
) =
y_high
*{signed} x_high
* 264 +
y_high
*{signed} x_low
* 232 +
y_low
*{signed} x_high
* 232 +
y_low
*{signed} x_low
y_high
*{signed} x_high
* 264 isn't calculated because it doesn't contribute to the least significant 64 bits of the product. We'd calculate it if we were interested in the full 128-bit product (full 96-bit product for the picky).
y_low
*{signed} x_low
is calculated using unsigned multiplication. It's legal to do so because 2's complement signed multiplication gives the same least significant bits as unsigned multiplication. Example:
-1 *{signed} -1 = 1
0xFFFFFFFFFFFFFFFF *{unsigned} 0xFFFFFFFFFFFFFFFF = 0xFFFFFFFFFFFFFFFE0000000000000001 (64 least significant bits are equivalent to 1)
Upvotes: 6
Reputation: 3736
Consider the context of line 8 and 9.
By this time, ESI
contains the lower half of y
and EBX
contains sgn(x)
. So line 8 is just computing sgn(x) * (y % 2^32)
and storing it in EBX
.
Line 9 draws upon that result. By the time Line 9 happens, ECX
contains a partial upper half of the multiplication, that is, x * (y >> 32)
signed. So EBX+ECX
ends up being what we computed in the last step plus the partial upper half we found on a previous line.
The full algorithm itself is pretty neat ;)
EDIT: In response to a comment below...
Line 4: Consider what SAR EDX, 31
(or if you like, sar $31, %edx
) really means. Since EDX
is a 32-bit register, you'll end up with one of two values. Which two? Consider what they mean in the context of signed arithmetic.
Line 7: EDX
by this point contains something pretty useful for the following operations. I'm just moving it where it needs to go.
Upvotes: 2
Reputation: 105
What imul does is multiplies the contents of eax with ecx and saves the lower 32 bits in eax and higher 32 bits in edx.
addl as far as I remember adds the two registers and saves it on the first one so in this case ebx. (I am not sure if it does anything else and the l after addl stands for long)
Upvotes: 0