Reputation: 63
Assembly program to compute sum of given sequence: 1+(1/1!)+(1/2!)+...+(1/10!) using floating point registers. Why do I get a segmentation fault(Core dumped) error in the following program?
global main
main:
mov eax,2
mov ebx, 2
mov edx, 2
.fact:
fld dword [ecx]
fmul dword [ebx]
fstp dword [ecx]
fld dword [ebx]
fadd dword [one]
fstp dword [ebx]
fld dword [ebx]
fld dword [eax]
fcomi st0,st1
ja .fact
.sum:
fld dword [one]
fdiv dword [ecx]
fadd dword [edx]
fstp dword [edx]
fld dword [ten]
fld dword [eax]
fcomi st0,st1
ja .exit
fadd dword [one]
mov ebx,2
jmp .fact
.exit:
push edx
push dword msg
call printf
add sp, 8
Upvotes: 1
Views: 846
Reputation: 2503
You have the right idea, but I can see a number of issues with this code that will prevent it from working as intended:
As it was already mentioned in the comments, the brackets in these floating-point instructions indicate loads/stores to memory. For example, the statement fld dword [ecx]
does not load the contents of ecx
into st0
. It loads the 32-bit floating point value at address ecx
into st0
. Right now, in several cases, you're trying to read/write from address 2
.
So for all of those loads and stores to work, you'll need to allocate variables on your stack. You use four of them (eax
, ecx
, edx
, and ebx
), and each is four bytes in size, so you can keep the register usage the way it currently is if you start your routine with something like:
sub esp, 0x10
lea eax, [esp]
lea ecx, [esp+4]
lea edx, [esp+8]
lea ebx, [esp+0xC]
If you're unfamiliar with lea
, it loads an address without reading from memory. lea
is the only1 x86 instruction where the brackets don't imply a memory access. So for example, the second line is functionally equivalent to mov eax, esp
.
Finally, remember to put the stack pointer back where it was before you leave the routine (add esp, 0x10
).
As mentioned in the previous section, instructions like mov ebx, 2
will not do what you want, because we want ebx
to point to some address in memory where we can store floats.
But even if you wrote mov dword [ebx], 2
to store 2
to address ebx
, this would still be wrong, because in this routine, we treat [ebx]
as a float, whereas the 2
you're storing to that location is the integer representation of 2.
The fild
instruction loads an integer from memory, converts it to a floating-point value, and stores it at the top of the FPU stack. For example, the following instructions would convert an integer to a float:
mov dword [ebx], 2 ; Now [ebx] contains 2
fild dword [ebx] ; Now st0 contains 2.0
fstp dword [ebx] ; Now [ebx] contains 2.0
Every time you use fld
you push a value onto the FPU stack. Every fstp
pops a value off the FPU stack. (That's what the 'p' means in fstp
) The way you're using fadd
, fmul
, and fdiv
just replace the value which is currently at the top of the FPU stack (a.k.a. st0
). That's all fine, but fcomi
doesn't pop anything off the FPU stack. So every time you go through the loop and do two fld
s followed by a fcomi
, you're adding two values to the stack and leaving them there.
The FPU stack can only store 8 values (st0
through st7
). So if all eight are filled and you try to push another value onto the FPU stack (e.g. with fld
), then you'll get a FPU stack-overflow exception and the load will fail.
There is another compare instruction, fcomip
, but it only pops one value off the FPU stack. To pop the other, you can use fstp st0
. This instruction basically means "Store st0
to st0
, then pop the top value off the FPU stack."
printf
doesn't support floatsWhen passing a float
to a variadic function, it is automatically promoted to a double
. As printf
is a variadic function, the %f
specifier (then of course) also expects a double
. So, at the end of your function, you want to print the contents of [edx]
(not edx
!) as a double. Since printf
expects a double (which is 64 bits wide), something like push dword [edx]
won't cut it.
When you load values onto the FPU stack, they're stored with the highest precision possible. When you store them to memory (via fstp
) you get to specify the precision (and thus the format and width). When you say fstp dword [address]
, you're specifying dword
width, i.e. a 32-bit float. If you want a double
, you just specify qword
.
We can store values directly onto the stack. So try something like:
sub esp, 8 ; Make space for the 64-bit double we're about to store
fld dword [edx] ; Load the 32-bit float we want to print
fstp qword [esp] ; Store the 64-bit double to the space we just allocated
Remember that after the printf
, you'll need to readjust your stack pointer (not just by 8 bytes anymore!). Also, note that the stack pointer in any 32-bit mode is esp
, not sp
.
The fourth instruction you have up there is fld dword [ecx]
. We haven't initialized ecx
yet in this routine. Since you are using [ecx]
to store the product for your factorial, you'll want this to be set to 1.0
before each time you begin the .fact
loop.
It looks like you're making an optimization by starting your main loop counter ([eax]
) with 2
and initializing your sum with 2 = 1/0! + 1/1!
. And before you run through the .fact
loop, you initialize [ebx]
to 2
, presumably another optimization to avoid multiplying by 1. In pseudocode, your algorithm looks like this:
// B is initialized to 2
.fact:
C = C * B
B = B + 1
if A > B: goto .fact
So essentially (if C
starts at 1), C
is the product of all integers from 2
to A-1
, always including 2
no matter what.
This isn't exactly what you want. Remember that you want to include 1/10!
in the sum, which, as written above, would require executing .fact
with A=11
. Also, consider the result of C
in this algorithm for different values of A
:
A: 2 3 4 5 6 ...
C: 2 2 6 24 120 ...
So referencing the above chart, according to the rest of your algorithm, the final sum will be 2 + 1/2 + 1/2 + 1/6 + 1/24 + 1/120 ...
. This will be off by 1/2
.
You can fix both of these issues by changing one line.
fldz
loads +0.0
and fld1
loads 1.0
1 It's the only one you likely need to care about. To be pedantic, the nop
instruction can also take an address argument, but as you would expect, it does not actually access the memory because it's a no-op.
Upvotes: 2