Reputation: 63

Segmentation fault in x86 while computing floating point sequence

Assembly program to compute sum of given sequence: 1+(1/1!)+(1/2!)+...+(1/10!) using floating point registers. Why do I get a segmentation fault(Core dumped) error in the following program?

global  main
main:
    mov eax,2
    mov ebx, 2
    mov edx, 2
.fact:
    fld dword [ecx]
    fmul dword [ebx]
    fstp dword [ecx]
    fld dword [ebx]
    fadd dword [one]
    fstp dword [ebx]
    fld dword [ebx]
    fld dword [eax]
    fcomi st0,st1
    ja .fact
.sum:
    fld dword [one]
    fdiv dword [ecx]
    fadd dword [edx]
    fstp dword [edx]
    fld dword [ten]    
    fld dword [eax]
    fcomi st0,st1
    ja .exit
    fadd dword [one]
    mov ebx,2
    jmp .fact
.exit:
    push edx
    push    dword msg       
    call    printf  
    add     sp, 8

Upvotes: 1

Answers (1)

user1354557

Reputation: 2503

You have the right idea, but I can see a number of issues with this code that will prevent it from working as intended:

You need to allocate memory

As it was already mentioned in the comments, the brackets in these floating-point instructions indicate loads/stores to memory. For example, the statement fld dword [ecx] does not load the contents of ecx into st0. It loads the 32-bit floating point value at address ecx into st0. Right now, in several cases, you're trying to read/write from address 2.

So for all of those loads and stores to work, you'll need to allocate variables on your stack. You use four of them (eax, ecx, edx, and ebx), and each is four bytes in size, so you can keep the register usage the way it currently is if you start your routine with something like:

sub esp, 0x10
lea eax, [esp]
lea ecx, [esp+4]
lea edx, [esp+8]
lea ebx, [esp+0xC]

If you're unfamiliar with lea, it loads an address without reading from memory. lea is the only¹ x86 instruction where the brackets don't imply a memory access. So for example, the second line is functionally equivalent to mov eax, esp.

Finally, remember to put the stack pointer back where it was before you leave the routine (add esp, 0x10).

You're not setting floating-point constants correctly

As mentioned in the previous section, instructions like mov ebx, 2 will not do what you want, because we want ebx to point to some address in memory where we can store floats.

But even if you wrote mov dword [ebx], 2 to store 2 to address ebx, this would still be wrong, because in this routine, we treat [ebx] as a float, whereas the 2 you're storing to that location is the integer representation of 2.

The fild instruction loads an integer from memory, converts it to a floating-point value, and stores it at the top of the FPU stack. For example, the following instructions would convert an integer to a float:

mov dword [ebx], 2  ; Now [ebx] contains 2
fild dword [ebx]    ; Now st0 contains 2.0
fstp dword [ebx]    ; Now [ebx] contains 2.0

You're overflowing the FPU stack

Every time you use fld you push a value onto the FPU stack. Every fstp pops a value off the FPU stack. (That's what the 'p' means in fstp) The way you're using fadd, fmul, and fdiv just replace the value which is currently at the top of the FPU stack (a.k.a. st0). That's all fine, but fcomi doesn't pop anything off the FPU stack. So every time you go through the loop and do two flds followed by a fcomi, you're adding two values to the stack and leaving them there.

The FPU stack can only store 8 values (st0 through st7). So if all eight are filled and you try to push another value onto the FPU stack (e.g. with fld), then you'll get a FPU stack-overflow exception and the load will fail.

There is another compare instruction, fcomip, but it only pops one value off the FPU stack. To pop the other, you can use fstp st0. This instruction basically means "Store st0 to st0, then pop the top value off the FPU stack."

`printf` doesn't support floats

When passing a float to a variadic function, it is automatically promoted to a double. As printf is a variadic function, the %f specifier (then of course) also expects a double. So, at the end of your function, you want to print the contents of [edx] (not edx!) as a double. Since printf expects a double (which is 64 bits wide), something like push dword [edx] won't cut it.

When you load values onto the FPU stack, they're stored with the highest precision possible. When you store them to memory (via fstp) you get to specify the precision (and thus the format and width). When you say fstp dword [address], you're specifying dword width, i.e. a 32-bit float. If you want a double, you just specify qword.

We can store values directly onto the stack. So try something like:

sub esp, 8       ; Make space for the 64-bit double we're about to store
fld dword [edx]  ; Load the 32-bit float we want to print
fstp qword [esp] ; Store the 64-bit double to the space we just allocated

Remember that after the printf, you'll need to readjust your stack pointer (not just by 8 bytes anymore!). Also, note that the stack pointer in any 32-bit mode is esp, not sp.

You need to initialize your product

The fourth instruction you have up there is fld dword [ecx]. We haven't initialized ecx yet in this routine. Since you are using [ecx] to store the product for your factorial, you'll want this to be set to 1.0 before each time you begin the .fact loop.

The condition on your factorial loop is off by one

It looks like you're making an optimization by starting your main loop counter ([eax]) with 2 and initializing your sum with 2 = 1/0! + 1/1!. And before you run through the .fact loop, you initialize [ebx] to 2, presumably another optimization to avoid multiplying by 1. In pseudocode, your algorithm looks like this:

// B is initialized to 2
.fact:
    C = C * B
    B = B + 1
    if A > B: goto .fact

So essentially (if C starts at 1), C is the product of all integers from 2 to A-1, always including 2 no matter what.

This isn't exactly what you want. Remember that you want to include 1/10! in the sum, which, as written above, would require executing .fact with A=11. Also, consider the result of C in this algorithm for different values of A:

A:  2  3  4  5   6   ...
C:  2  2  6  24  120 ...

So referencing the above chart, according to the rest of your algorithm, the final sum will be 2 + 1/2 + 1/2 + 1/6 + 1/24 + 1/120 .... This will be off by 1/2.

You can fix both of these issues by changing one line.

Extra stuff

You need to exit your program somehow or you'll just keep executing instructions beyond the boundaries of the program until you segfault
There are special instructions for loading common values onto the FPU stack. Two of them could be useful to you: fldz loads +0.0 and fld1 loads 1.0
Any time you want an answer to the question "Why did my program segfault?" you should run the program in a debugger. It will show you the exact instruction where the segfault occurred and allow you to examine the state of the program (registers, contents of memory, etc). The payoff of learning a few quick commands in the debugger is well worth it :)

¹ It's the only one you likely need to care about. To be pedantic, the nop instruction can also take an address argument, but as you would expect, it does not actually access the memory because it's a no-op.

Upvotes: 2