Reputation: 53
I have a program in C which uses a NASM function. Here is the code of the C program:
#include <stdio.h>
#include <string.h>
#include <math.h>
extern float hyp(float a); // supposed to calculate 1/(2 - a) + 6
void test(float (*f)(float)){
printf("%f %f %f\n", f(2.1), f(2.1), f(2.1));
}
void main(int argc, char** argv){
for(int i = 1; i < argc; i++){
if(!strcmp(argv[i], "calculate")){
test(hyp);
}
}
}
And here is the NASM function:
section .data
a dd 1.0
b dd 2.0
c dd 6.0
section .text
global hyp
hyp:
push ebp
mov ebp, esp
finit
fld dword[b]
fsub dword[ebp + 8]
fstp dword[b]
fld dword[a]
fdiv dword[b]
fadd dword[c]
mov esp, ebp
pop ebp
ret
These programs were linked in Linux with gcc and nasm. Here is the Makefile:
all: project clean
main.o: main.c
gcc -c main.c -o main.o -m32 -std=c99
hyp.o: hyp.asm
nasm -f elf32 -o hyp.o hyp.asm -D UNIX
project: main.o hyp.o
gcc -o project main.o hyp.o -m32 -lm
clean:
rm -rf *.o
When the program is run, it outputs this:
5.767442 5.545455 -4.000010
The last number is correct. My question is: why do these results differ even though the input is the same?
Upvotes: 1
Views: 87
Reputation: 140669
The bug is that you do this:
fstp dword[b]
That overwrites the value of b
, so the next time you call the function, the constant is wrong. In the overall program's output, this shows up as the rightmost evaluation being the only correct one, because the compiler evaluated the arguments to printf
from right to left. (It is allowed to evaluate the arguments to a multi-argument function in any order it wants.)
You should have used the .rodata
section for your constants; then the program would crash rather than overwrite a constant.
You can avoid needing to store and reload an intermediate value by using fdivr
instead of fdiv
.
hyp:
fld DWORD PTR [b]
fsub DWORD PTR [esp+4]
fdivr DWORD PTR [a]
fadd DWORD PTR [c]
ret
Alternatively, do what a Forth programmer would do, and load the constant 1 before everything else, so it's in ST(1) when it needs to be. This allows you to use fld1
instead of putting 1.0 in memory.
hyp:
fld1
fld DWORD PTR [b]
fsub DWORD PTR [esp+4]
fdivp
fadd DWORD PTR [c]
ret
You do not need to issue a finit
, because the ABI guarantees that this was already done during process startup. You do not need to set up EBP for this function, as it does not make any function calls itself (the jargon term for this is "leaf procedure"), nor does it need any scratch space on the stack.
Another alternative, if you have a modern CPU, is to use the newer SSE2 instructions. That gives you normal registers instead of an operand stack, and also means the calculations are all actually done in float
instead of 80-bit extended, which can be very important — some complex numerical algorithms will malfunction if they have more floating-point precision than the designers expected to have. Because you're using the 32-bit ELF ABI, though, the return value still needs to wind up in ST(0), and there's no direct move instructions between SSE and x87 registers, you have to go through memory. I don't know how to write SSE2 instructions in Intel syntax, sorry.
hyp:
subl $4, %esp
movss b, %xmm1
subss 8(%esp), %xmm1
movss a, %xmm0
divss %xmm1, %xmm0
addss c, %xmm0
movss %xmm0, (%esp)
flds (%esp)
addl $4, %esp
ret
In the 64-bit ELF ABI, with floating-point return values in XMM0 (and argument passing in registers by default as well), that would just be
hyp:
movss b(%rip), %xmm1
subss %xmm0, %xmm1
movss a(%rip), %xmm0
divss %xmm1, %xmm0
addss c(%rip), %xmm0
ret
Upvotes: 2