Using SSE to mimic the standard Math.pow function

Question

I'm trying to learn how to work with SSE and I decided to realize a simple code that computes n^d, using a function that gets called by a C program.

Here's my NASM code:

section .data

resmsg:     db      '%d^%d = %d', 0

section .bss

section .text

extern printf


; ------------------------------------------------------------
; Function called from a c program, I only use n and d parameters but I left the others
; ------------------------------------------------------------

global main

T       equ     8
n       equ     12
d       equ     16
m       equ     20
Sid     equ     24
Sn      equ     28

main:
    ; ------------------------------------------------------------
    ; Function enter sequence
    ; ------------------------------------------------------------
    push    ebp             ; save Base Pointer
    mov     ebp, esp        ; Move Base Point to current frame
    sub     esp, 8          ; reserve space for two local vars
    push    ebx             ; save some registries (don't know if needed)
    push    esi
    push    edi

    ; ------------------------------------------------------------
    ; copy function's parameters to registries from stack
    ; ------------------------------------------------------------
    mov     eax, [ebp+T]        ; T
    mov     ebx, [ebp+n]        ; n
    mov     ecx, [ebp+d]        ; d
    mov     edx, [ebp+m]        ; m
    mov     esi, [ebp+Sid]      ; Sid
    mov     edi, [ebp+Sn]       ; Sn    
    mov     [ebp-8], ecx        ; copy ecx into one of the local vars

    ;
    ; pow is computed by doing n*n d times
    ;
    movss   xmm0, [ebp+n]   ; base
    movss   xmm1, [ebp+n]   ; another copy of the base because xmm0 will be overwritten by the result

loop:   mulss   xmm0, xmm1      ; scalar mult from sse
        dec     ecx             ; counter--
        cmp     ecx,0           ; check if counter is 0 to end loop
        jnz     loop            ; 

    ;
    ; let's store the result in eax by moving it to the stack and then copying to the registry (we use the other local var as support)
    ;
    movss   [ebp-4], xmm0       
    mov     eax, [ebp-4]

    ;
    ; Print using C's printf
    ;       
    push    eax                 ; result
    mov     ecx, [ebp-8]        ; copy the original d back since we used it as loop's counter
    push    ecx                 ; exponent
    push    ebx                 ; base
    push    resmsg              ; string format
    call    printf              ; printf call
    add     esp, 24             ; clean the stack from both our local and printf's vars

    ; ------------------------------------------------------------
    ; Function exit sequence
    ; ------------------------------------------------------------

    pop edi                     ; restore the registries
    pop esi
    pop ebx
    mov esp, ebp                ; restore the Stack Pointer
    pop ebp                     ; restore the Base Pointer
    ret                         ; get back to C program

Now, what I'd expect is it to print

4^2 = 16

but, instead, I got

4^2 = 0

I've spent my whole afternoon on this and I couldn't find a solution, do you have any hints?

EDIT:

Since it seems a format problem, I tried converting the data using

movss   [ebp-4], xmm0       
fld     dword [ebp-4]
mov     eax, dword [ebp-4]

instead of

movss   [ebp-4], xmm0       
mov     eax, [ebp-4]

but I got the same result.

rkhb · Accepted Answer

MOVSS moves single precision floats (32-bit). I assume that n is an integer so you can't load it into a XMM register with MOVSS. Use CVTSI2SS instead. printf cannot process single precision floats, which would converted to doubles by the compiler. It's convenient to use CVTSS2SI at this point. So the code should look like:

...
    ;
    ; pow is computed by doing n*n d times
    ;

    cvtsi2ss xmm0, [ebp+n]      ; load integer
    sub ecx, 1                  ; first step (n^1) is done
    cvtsi2ss xmm1, [ebp+n]      ; load integer

loop:
    mulss   xmm0, xmm1          ; scalar mult from sse
    sub     ecx, 1
    jnz     loop

    cvtss2si eax, xmm0          ; result as integer

    ;
    ; Print using C's printf
    ;
    push    eax                 ; result
    mov     ecx, [ebp-8]        ; copy the original d back since we used it as loop's counter
    push    ecx                 ; exponent
    push    ebx                 ; base
    push    resmsg              ; string format
    call    printf              ; printf call
    add     esp, 16             ; clean the stack only from printf's vars
...

Using SSE to mimic the standard Math.pow function

Answers (1)

Related Questions