user1282285
user1282285

Reputation: 9

How to detect overflow conditions in Assembly Language X86

I have an assignment in which we have to write two functions. Also must detect overflow conditions using the processor's condition codes and return 0 to indicate that an error has been encountered. I was able to write the functions.

 .file  "formula.c"  
    .text
.globl _nCr  
    .def    _nCr;   .scl    2;  .type   32; .endef  
_nCr:  
        pushl   %ebp  
    movl    %esp, %ebp  
    subl    $56, %esp  
    movl    8(%ebp), %eax  
    movl    %eax, (%esp)  
    testl %eax, %eax  
    call    _factorial  
    movl    %eax, -12(%ebp)  
    movl    12(%ebp), %eax  
    addl    $1, %eax  
    movl    %eax, (%esp)  
    call    _factorial  
    movl    %eax, -16(%ebp)  
    movl    12(%ebp), %eax  
    notl    %eax  
    addl    8(%ebp), %eax  
    movl    %eax, (%esp)  
    call    _factorial  
    movl    %eax, -20(%ebp)  
    movl    -16(%ebp), %eax  
    movl    %eax, %edx  
    imull   -20(%ebp), %edx  
    movl    %edx, -28(%ebp)  
    movl    -12(%ebp), %eax  
    movl    %eax, %edx  
    sarl    $31, %edx  
    idivl   -28(%ebp)  
    leave  
    ret  
.globl _factorial   
    .def    _factorial;  .scl    2;     .type   32;     .endef   
_factorial:  
    pushl   %ebp  
    movl    %esp, %ebp  
    subl    $16, %esp  
    movl    $1, -8(%ebp)  
    movl    $1, -4(%ebp)  
    jmp L3  
L4:   
    movl    -8(%ebp), %eax   
    imull   -4(%ebp), %eax  
    movl    %eax, -8(%ebp)  
    addl    $1, -4(%ebp)   
L3:
    movl    -4(%ebp), %eax  
    cmpl    8(%ebp), %eax  
    jle L4  
    movl    -8(%ebp), %eax  
    leave  
    ret  
    .def    ___main;    .scl    2;  .type   32; .endef  
    .section .rdata,"dr"  
    .align 4  

This function basically does nCr = n! / (r! (n-r)!). The overflow occurs in factorial when the numbers get larger.

I just do not understand how I would set the overflow conditions.

Upvotes: 0

Views: 5692

Answers (3)

Peter Cordes
Peter Cordes

Reputation: 363980

Most instructions set OF on signed overflow, or CF on unsigned overflow. http://teaching.idallen.com/dat2343/10f/notes/040_overflow.txt explains for add/sub. (Bitwise boolean instructions like and/or/xor can't overflow so they always clear CF and OF).

imul sets both OF and CF when the full result isn't the sign extension of the low half result (same width as the inputs). This applies even for the efficient 2-operand form that don't write the high half anywhere; they still set flags according to what it would be. If you unsigned overflow detection for multiply, you need to use the clumsy one-operand mul.

Division raises a #DE exception when the quotient doesn't fit into AL/AX/EAX/RAX (depending on operand-size). Unfortunately there's no way to suppress / mask this, so you can't try 2N / N => N-bit division with a large dividend and detect overflow after the fact, unless you have a signal handler to catch SIGFPE (on POSIX OSes). Or on bare metal, an interrupt handler for #DE.

For combinatorials specifically:

Instead of naively calculating n!, you can cancel earlier and just compute prod(r+1 .. n). Actually use the larger or r or n-r, and divide by the other one.

You still end with dividing by a potentially large number so this doesn't eliminate the chance of overflow for all possible results that fit in a 32-bit integer. But it extends the range you can handle simply, and of course is faster because you're doing fewer multiplies. e.g. C(999, 1000) only does 1000 / (1000-999)! so no multiplies and only one div.

If you do the last multiply of the product with a mul instruction to produce a 64-bit result in EDX:EAX, you can use that as the 64-bit dividend for a 32-bit division. (If you want to risk an exception.)

mul is NxN => 2N multiplication, so if you just use it in a loop it ignores the high half of the previous output. If you multiply in low-to-high order so the last multiply is the high end of the range, that gives you the biggest possible range for this to work.

So for example you might do

// mix of C pseudocode and AT&T 32-bit.
//  Some real registers, some var names: pick registers for those.

   if (n == r) return 1;
   divisor = factorial(min(r, n-r));   // and save in another register until later
   eax = max(r,n-r) + 1;               // prod

   xor  %edx, %edx        # in case we skip the mul
   cmp  n, %eax
   jae  endprod           # loop might need to run 0 times, but n==r case already handled

   lea  1(%eax), %ecx     # i = low+1.  Not overflow-checked
   jmp  loop_entry

 prod_loop:                  # do{
    imul  %ecx, %eax         # prod *= i for i=[max+2 .. n-1]
     jo  overflow_in_prod    # maybe need mul to avoid spurious signed but not unsigned overflow cases
    inc   %ecx
  loop_entry:
    cmp   n, %ecx
    jb    prod_loop          # }while(i<n)
                           # leave the loop with ECX = n, with one multiply left to do
    mul   %ecx             # EDX:EAX = n * EAX
    # We're keeping the full 64-bit result, therefore this can't overflow
 endprod:

    div   divisor          # prod /= the smaller factorial
    # EDX:EAX / divisor, quotient in EAX.  Or will raise #DE if it didn't fit.
    ret

overflow_in_prod:
   do something
   ret

Untested and not carefully thought through, might be off-by-one errors / corner case bugs in the loop setup / bounds.

This is the kind of thing I was describing: we can check for overflow while accumulating the product, except for the last step which we allow to produce a 64-bit result.

There are probably cases where the prod loop's last imul will produce a 32-bit unsigned result with the high bit set, but no unsigned overflow. Using imul/jo would spuriously detect that as overflow, because it is signed overflow. And where the final div wouldn't overflow. So if you care about that more than speed, use a (slightly slower) mul there, too.

Anyway, this lets us handle C(18, 9) where prod(10 .. 18) = 0x41b9e4200. The last imul will produce EAX = 0x3a6c5900 which fits in 32 bits, and the final mul will multiply that by 18 to produce EDX:EAX = 0x41b9e4200 (35 significant bits). Then we divide that by 9! = 0x58980 and get EAX = 0xbdec.

The number of significant bits in EDX:EAX can be even larger when n and r are large (but close together so we still avoid overflow). They have to be far enough apart for the (n-r)! divisor to be large enough to bring the final result back down to fit in 32 bits, though.

Otherwise you'd need extended-precision division, which is possible...

Upvotes: 2

Kaz
Kaz

Reputation: 58500

On the x86 architecture, when an arithmetic instruction executes such as addl 8(%ebp), %eax the condition codes are set in the CPU status word. There are instructions whose behavior depends on condition codes.

You can have the code take an alternate path (execute a branch) on a given condition. The x86 has a family of conditional branching instructions under the Jxx mnemonics: JA, JAE, JB, JBE, JC, JCXZ, ..., JZ. For instance JZ means jump if zero: take a branch if the instruction produced a zero result, setting the zero flag. JO is jump on overflow.

A condition can also be converted to a byte datum and stored into a register or memory. This is useful for compiling C expressions like:

 x = (y != 3); /* if (y != 3) x = 1; else x = 0 */

It is done by the SETx group of instructions which are also numerous, like the conditional branches: SETA, SETAE, SETB, ..., SETZ. For instance SETZ will set a given byte to 1 if the zero condition is true. E.g.

 seto %bl  /* set bottom byte of B register to 1 if overflow flag set */

Upvotes: 3

paulsm4
paulsm4

Reputation: 121599

1) Your arithmetic commands are the operations that could potentially set the overflow bit

2) The "JO" (jump on overflow) and "JNO" (jump on not overflow) allow you to branch, depending on whether an overflow occurred or not

3) You'd probably just set "%eax" to 0 after "JO".

4) Excellent, excellent resource if you're not already familiar with it:

Programming from the Ground Up, Jonathan Bartlett

Upvotes: 3

Related Questions