Reputation: 9
I have an assignment in which we have to write two functions. Also must detect overflow conditions using the processor's condition codes and return 0
to indicate that an error has been encountered. I was able to write the functions.
.file "formula.c"
.text
.globl _nCr
.def _nCr; .scl 2; .type 32; .endef
_nCr:
pushl %ebp
movl %esp, %ebp
subl $56, %esp
movl 8(%ebp), %eax
movl %eax, (%esp)
testl %eax, %eax
call _factorial
movl %eax, -12(%ebp)
movl 12(%ebp), %eax
addl $1, %eax
movl %eax, (%esp)
call _factorial
movl %eax, -16(%ebp)
movl 12(%ebp), %eax
notl %eax
addl 8(%ebp), %eax
movl %eax, (%esp)
call _factorial
movl %eax, -20(%ebp)
movl -16(%ebp), %eax
movl %eax, %edx
imull -20(%ebp), %edx
movl %edx, -28(%ebp)
movl -12(%ebp), %eax
movl %eax, %edx
sarl $31, %edx
idivl -28(%ebp)
leave
ret
.globl _factorial
.def _factorial; .scl 2; .type 32; .endef
_factorial:
pushl %ebp
movl %esp, %ebp
subl $16, %esp
movl $1, -8(%ebp)
movl $1, -4(%ebp)
jmp L3
L4:
movl -8(%ebp), %eax
imull -4(%ebp), %eax
movl %eax, -8(%ebp)
addl $1, -4(%ebp)
L3:
movl -4(%ebp), %eax
cmpl 8(%ebp), %eax
jle L4
movl -8(%ebp), %eax
leave
ret
.def ___main; .scl 2; .type 32; .endef
.section .rdata,"dr"
.align 4
This function basically does nCr = n! / (r! (n-r)!)
. The overflow occurs in factorial when the numbers get larger.
I just do not understand how I would set the overflow conditions.
Upvotes: 0
Views: 5692
Reputation: 363980
Most instructions set OF on signed overflow, or CF on unsigned overflow. http://teaching.idallen.com/dat2343/10f/notes/040_overflow.txt explains for add/sub. (Bitwise boolean instructions like and/or/xor can't overflow so they always clear CF and OF).
imul
sets both OF and CF when the full result isn't the sign extension of the low half result (same width as the inputs). This applies even for the efficient 2-operand form that don't write the high half anywhere; they still set flags according to what it would be. If you unsigned overflow detection for multiply, you need to use the clumsy one-operand mul
.
Division raises a #DE exception when the quotient doesn't fit into AL/AX/EAX/RAX (depending on operand-size). Unfortunately there's no way to suppress / mask this, so you can't try 2N / N => N-bit division with a large dividend and detect overflow after the fact, unless you have a signal handler to catch SIGFPE (on POSIX OSes). Or on bare metal, an interrupt handler for #DE.
Instead of naively calculating n!
, you can cancel earlier and just compute prod(r+1 .. n)
. Actually use the larger or r
or n-r
, and divide by the other one.
You still end with dividing by a potentially large number so this doesn't eliminate the chance of overflow for all possible results that fit in a 32-bit integer. But it extends the range you can handle simply, and of course is faster because you're doing fewer multiplies. e.g. C(999, 1000)
only does 1000 / (1000-999)!
so no multiplies and only one div
.
If you do the last multiply of the product with a mul
instruction to produce a 64-bit result in EDX:EAX, you can use that as the 64-bit dividend for a 32-bit division. (If you want to risk an exception.)
mul
is NxN => 2N multiplication, so if you just use it in a loop it ignores the high half of the previous output. If you multiply in low-to-high order so the last multiply is the high end of the range, that gives you the biggest possible range for this to work.
So for example you might do
// mix of C pseudocode and AT&T 32-bit.
// Some real registers, some var names: pick registers for those.
if (n == r) return 1;
divisor = factorial(min(r, n-r)); // and save in another register until later
eax = max(r,n-r) + 1; // prod
xor %edx, %edx # in case we skip the mul
cmp n, %eax
jae endprod # loop might need to run 0 times, but n==r case already handled
lea 1(%eax), %ecx # i = low+1. Not overflow-checked
jmp loop_entry
prod_loop: # do{
imul %ecx, %eax # prod *= i for i=[max+2 .. n-1]
jo overflow_in_prod # maybe need mul to avoid spurious signed but not unsigned overflow cases
inc %ecx
loop_entry:
cmp n, %ecx
jb prod_loop # }while(i<n)
# leave the loop with ECX = n, with one multiply left to do
mul %ecx # EDX:EAX = n * EAX
# We're keeping the full 64-bit result, therefore this can't overflow
endprod:
div divisor # prod /= the smaller factorial
# EDX:EAX / divisor, quotient in EAX. Or will raise #DE if it didn't fit.
ret
overflow_in_prod:
do something
ret
Untested and not carefully thought through, might be off-by-one errors / corner case bugs in the loop setup / bounds.
This is the kind of thing I was describing: we can check for overflow while accumulating the product, except for the last step which we allow to produce a 64-bit result.
There are probably cases where the prod loop's last imul
will produce a 32-bit unsigned result with the high bit set, but no unsigned overflow. Using imul/jo would spuriously detect that as overflow, because it is signed overflow. And where the final div wouldn't overflow. So if you care about that more than speed, use a (slightly slower) mul
there, too.
Anyway, this lets us handle C(18, 9)
where prod(10 .. 18) = 0x41b9e4200
. The last imul will produce EAX = 0x3a6c5900
which fits in 32 bits, and the final mul
will multiply that by 18 to produce EDX:EAX = 0x41b9e4200
(35 significant bits). Then we divide that by 9! = 0x58980
and get EAX = 0xbdec.
The number of significant bits in EDX:EAX can be even larger when n
and r
are large (but close together so we still avoid overflow). They have to be far enough apart for the (n-r)!
divisor to be large enough to bring the final result back down to fit in 32 bits, though.
Otherwise you'd need extended-precision division, which is possible...
Upvotes: 2
Reputation: 58500
On the x86 architecture, when an arithmetic instruction executes such as addl 8(%ebp), %eax
the condition codes are set in the CPU status word. There are instructions whose behavior depends on condition codes.
You can have the code take an alternate path (execute a branch) on a given condition. The x86 has a family of conditional branching instructions under the Jxx
mnemonics: JA, JAE, JB, JBE, JC, JCXZ, ..., JZ
. For instance JZ
means jump if zero: take a branch if the instruction produced a zero result, setting the zero flag. JO
is jump on overflow.
A condition can also be converted to a byte datum and stored into a register or memory. This is useful for compiling C expressions like:
x = (y != 3); /* if (y != 3) x = 1; else x = 0 */
It is done by the SETx
group of instructions which are also numerous, like the conditional branches: SETA, SETAE, SETB, ..., SETZ
. For instance SETZ will set a given byte to 1 if the zero condition is true. E.g.
seto %bl /* set bottom byte of B register to 1 if overflow flag set */
Upvotes: 3
Reputation: 121599
1) Your arithmetic commands are the operations that could potentially set the overflow bit
2) The "JO" (jump on overflow) and "JNO" (jump on not overflow) allow you to branch, depending on whether an overflow occurred or not
3) You'd probably just set "%eax" to 0 after "JO".
4) Excellent, excellent resource if you're not already familiar with it:
Programming from the Ground Up, Jonathan Bartlett
Upvotes: 3