Reputation: 1174
After some researches I can't find out which division algorithm is used by the __div_64_32 function in the Linux kernel:
# arch/powerpc/boot/div64.S
#include "ppc_asm.h"
.globl __div64_32
__div64_32:
lwz r5,0(r3) # get the dividend into r5/r6
lwz r6,4(r3)
cmplw r5,r4
li r7,0
li r8,0
blt 1f
divwu r7,r5,r4 # if dividend.hi >= divisor,
mullw r0,r7,r4 # quotient.hi = dividend.hi / divisor
subf. r5,r0,r5 # dividend.hi %= divisor
beq 3f
1: mr r11,r5 # here dividend.hi != 0
andis. r0,r5,0xc000
bne 2f
cntlzw r0,r5 # we are shifting the dividend right
li r10,-1 # to make it < 2^32, and shifting
srw r10,r10,r0 # the divisor right the same amount,
addc r9,r4,r10 # rounding up (so the estimate cannot
andc r11,r6,r10 # ever be too large, only too small)
andc r9,r9,r10
addze r9,r9
or r11,r5,r11
rotlw r9,r9,r0
rotlw r11,r11,r0
divwu r11,r11,r9 # then we divide the shifted quantities
2: mullw r10,r11,r4 # to get an estimate of the quotient,
mulhwu r9,r11,r4 # multiply the estimate by the divisor,
subfc r6,r10,r6 # take the product from the divisor,
add r8,r8,r11 # and add the estimate to the accumulated
subfe. r5,r9,r5 # quotient
bne 1b
3: cmplw r6,r4
blt 4f
divwu r0,r6,r4 # perform the remaining 32-bit division
mullw r10,r0,r4 # and get the remainder
add r8,r8,r0
subf r6,r10,r6
4: stw r7,0(r3) # return the quotient in *r3
stw r8,4(r3)
mr r3,r6 # return the remainder in r3
blr
From what I understood, the division is performed in 2 steps, but I'm quite confused with "estimate of the quotient" concept.
Moreover, considering the function code, I don't understand what this particular instruction does:
andis. r0,r5,0xc000 # why this 0xC000 value ?
Does anybody have some more explanation about this piece of code ?
Upvotes: 2
Views: 211
Reputation: 4041
The main function is:
r5:r6 -> r7:r8 * r4 + r6
It's decomposed as:
0 Load:
Load from memory
Set the temp registers
Make the first division (divwu r7,r5,r4)
1 Divide Loop:
Shift left
Take carre of carry
Divide
2 Accumulate Loop:
Accumulate
If hight part != 0 goto 1
3 Last low division:
Divide the low part the old way 32 bit / 32bit
4 Store result:
After the first division (end of part 0), the remainder of the hight part of the dividend (r5) is lower than the divisor (r4).
We cannot divide it anymore. We must shift it and remember the shift applyed. So we count the leading 0 in r5 cntlzw r0,r5
. We shift the dividend by this amount, divide, shift back.
The algorithm used is r5/r3 = ( r5 * 2^x /r3 ) / 2^x. The difficulty is with the carry, overflow ...
You gave the good question: why andis. r0,r5,0xc000
.
From the doc: andis. rz,rv,i: rz16−31 = rv & (i << 16), rz32−63 = rz0−15 = 0
. From python: bin(0xc000 << 16) = 0b11000000000000000000000000000000
. SO it is here because, we should not shift if the first or second bit of the dividend.hi is setted.
Well is the first is setted, we shift by zero so useless, we just loose time. If the second bit is setted... My guess is that we may overflow something when shifting right after the division. Espescially because we shift left as much as possible at the begining of label 1 (get x as larger as possible reduces the loop number).
Finally what he calls estimators we (I) call it accumulators. He calls them estimator cause they accumulate each time a smaller number.
Salut Bordeaux de Rennes !
Upvotes: 2