Algorithm used by __div_64_32

Question

After some researches I can't find out which division algorithm is used by the __div_64_32 function in the Linux kernel:

# arch/powerpc/boot/div64.S
#include "ppc_asm.h"

    .globl __div64_32
__div64_32:
    lwz r5,0(r3)    # get the dividend into r5/r6
    lwz r6,4(r3)
    cmplw   r5,r4
    li  r7,0
    li  r8,0
    blt 1f
    divwu   r7,r5,r4    # if dividend.hi >= divisor,
    mullw   r0,r7,r4    # quotient.hi = dividend.hi / divisor
    subf.   r5,r0,r5    # dividend.hi %= divisor
    beq 3f
1:  mr  r11,r5      # here dividend.hi != 0
    andis.  r0,r5,0xc000
    bne 2f
    cntlzw  r0,r5       # we are shifting the dividend right
    li  r10,-1      # to make it < 2^32, and shifting
    srw r10,r10,r0  # the divisor right the same amount,
    addc    r9,r4,r10   # rounding up (so the estimate cannot
    andc    r11,r6,r10  # ever be too large, only too small)
    andc    r9,r9,r10
    addze   r9,r9
    or  r11,r5,r11
    rotlw   r9,r9,r0
    rotlw   r11,r11,r0
    divwu   r11,r11,r9  # then we divide the shifted quantities
2:  mullw   r10,r11,r4  # to get an estimate of the quotient,
    mulhwu  r9,r11,r4   # multiply the estimate by the divisor,
    subfc   r6,r10,r6   # take the product from the divisor,
    add r8,r8,r11   # and add the estimate to the accumulated
    subfe.  r5,r9,r5    # quotient
    bne 1b
3:  cmplw   r6,r4
    blt 4f
    divwu   r0,r6,r4    # perform the remaining 32-bit division
    mullw   r10,r0,r4   # and get the remainder
    add r8,r8,r0
    subf    r6,r10,r6
4:  stw r7,0(r3)    # return the quotient in *r3
    stw r8,4(r3)
    mr  r3,r6       # return the remainder in r3
    blr

From what I understood, the division is performed in 2 steps, but I'm quite confused with "estimate of the quotient" concept.

Moreover, considering the function code, I don't understand what this particular instruction does:

andis.  r0,r5,0xc000 # why this 0xC000 value ?

Does anybody have some more explanation about this piece of code ?

Tinmarino · Accepted Answer

The main function is:

r5:r6 -> r7:r8 * r4 + r6

It's decomposed as:

0 Load:
    Load from memory
    Set the temp registers
    Make the first division (divwu  r7,r5,r4)

1 Divide Loop:
    Shift left 
    Take carre of carry
    Divide

2 Accumulate Loop:
    Accumulate
    If hight part != 0 goto 1

3 Last low division:
    Divide the low part the old way 32 bit / 32bit

4 Store result:

After the first division (end of part 0), the remainder of the hight part of the dividend (r5) is lower than the divisor (r4). We cannot divide it anymore. We must shift it and remember the shift applyed. So we count the leading 0 in r5 cntlzw r0,r5. We shift the dividend by this amount, divide, shift back.

The algorithm used is r5/r3 = ( r5 * 2^x /r3 ) / 2^x. The difficulty is with the carry, overflow ...

You gave the good question: why andis. r0,r5,0xc000. From the doc: andis. rz,rv,i: rz16−31 = rv & (i << 16), rz32−63 = rz0−15 = 0. From python: bin(0xc000 << 16) = 0b11000000000000000000000000000000. SO it is here because, we should not shift if the first or second bit of the dividend.hi is setted.

Well is the first is setted, we shift by zero so useless, we just loose time. If the second bit is setted... My guess is that we may overflow something when shifting right after the division. Espescially because we shift left as much as possible at the begining of label 1 (get x as larger as possible reduces the loop number).

Finally what he calls estimators we (I) call it accumulators. He calls them estimator cause they accumulate each time a smaller number.

Salut Bordeaux de Rennes !

Algorithm used by __div_64_32

Answers (1)

Related Questions