Why does this simple 64-bit/32-bit division algorithm using 32-bit division work?

Question

I was looking for a 64-bit/32-bit division algorithm preferably with a small code size for a 32-bit x86 machine. I found this surprisingly simple one, but couldn't see how it works.

The algorithm is as follows.

We are calculating x / y where x is 64-bit and y is 32-bit. Both are unsigned.
x.l is the lower bits of x, and x.h is the higher bits of x, assuming a little-endian machine.
h <- x.h / y
x.h <- x.h % y
x.l <- x / y, (x.h <- x % y); 64-bit/32-bit division only works when the quotient fits in 32 bits, which seems to be true in this case.
x.h <- h
return x

My math is just too short to easily get this algorithm. Could you help me to understand this algorithm?

This is a test program I wrote to see if it works.

#include 
#include 
#include 

typedef union {
    uint64_t q;
    struct {
        uint32_t l;
        uint32_t h;
    };
} uint64_u;

uint64_t div_(uint64_t _x, uint32_t y) {
    uint64_u x = {_x};
    uint32_t h = x.h / y;
    x.h = x.h % y;
    __asm__ (
        "div %2":
        "+a" (x.l),
        "+d" (x.h):
        "r" (y)
    );
    x.h = h;
    return x.q;
}

int main() {
    for (int i = 0; i < 1000000; ++i) {
        uint64_t x = (uint64_t)rand() << 32 | rand();
        uint32_t y = rand() % RAND_MAX + 1;
        uint64_t r0 = x / y;
        uint64_t r1 = div_(x, y);
        if (r0 != r1) {
            abort();
        }
    }
    return 0;
}

chqrlie · Accepted Answer

The algorithm is an adaptation of the classic long division method taught in high schools:

instead of base 10, use base 2³²
dividing a 64-bit dividend by a 32-bit divisor is akin to dividing a 2 digit number A:B by a one digit number C
take the first digit A and divide that by the dividend C: in A goes D times C, remainder E
take the next digit B: in E:B goes F times C, remainder G (ignored in the code, but could be returned via a pointer)
the result is D:F as constructed in the code.

The code uses a div instruction that takes a 64-bit operand EDX:EAX constructed as above from (x.h % y):(x.l). See https://godbolt.org/z/7x9K8dzsf

A fully portable version, without assembly and using only 32-bit operations is more difficult to write efficiently.

Why does this simple 64-bit/32-bit division algorithm using 32-bit division work?

Answers (1)

Related Questions