Craig Gidney
Craig Gidney

Reputation: 18296

Where is the mistake in this apparently-O(n lg n) multiplication algorithm?

A recent puzzle blog post about finding three evenly spaced ones lead me to a stackoverflow question with a top answer that claims to do it in O(n lg n) time. The interesting part is that the solution involves squaring a polynomial, referencing a paper that describes how to do it in O(n lg n) time.

Now, multiplying polynomials is practically the same as multiplying numbers. The only real difference is the lack of carries. But... the carries can also be done in O(n lg n) time. For example:

    var value = 100; // = 0b1100100

    var inputBitCount = value.BitCount(); // 7 (because 2^7 > 100 >= 2^6)
    var n = inputBitCount * 2; // 14
    var lgn = n.BitCount(); // 4 (because 2^4 > 14 => 2^3)
    var c = lgn + 1; //5; enough space for 2n carries without overflowing

    // do apparently O(n log n) polynomial multiplication
    var p = ToPolynomialWhereBitsAreCoefficients(value); // x^6 + x^5 + x^2
    var p2 = SquarePolynomialInNLogNUsingFFT(p); // x^12 + 2x^11 + 2x^10 + x^8 + 2x^7 + x^4
    var s = CoefficientsOfPolynomial(p2); // [0,0,0,0,1,0,0,2,1,0,2,2,1]
    // note: s takes O(n lg n) space to store (each value requires at most c-1 bits)

    // propagate carries in O(n c) = O(n lg n) time
    for (var i = 0; i < n; i++)
        for (var j = 1; j < c; j++)
            if (s[i].Bit(j))
                s[i + j].IncrementInPlace();

    // extract bits of result (in little endian order)
    var r = new bool[n];
    for (var i = 0; i < n; i++)
        r[i] = s[i].Bit(0);

    // r encodes 0b10011100010000 = 10000

So my question is this: where's the mistake, here? Multiplying numbers in O(n lg n) is a gigantic open problem in computer science, and I really really doubt the answer would be this simple.

I've actually written code to implement this, except for the efficient polynomial multiplication (I don't understand the number theoretic transform well enough yet). Random testing appears to confirm the algorithm being correct, so the issue is likely in the time complexity analysis.

Upvotes: 1

Views: 392

Answers (1)

tmyklebu
tmyklebu

Reputation: 14215

The problem is that the SquarePolynomialUsingFFT step can't actually be done in O(n log(n)) time if n is the number of bits under either the word-RAM or the bit-RAM model. In the word-RAM model (i.e. common operations on log(n)-bit words have unit cost), you've just blown your n words into n log(n) words and then you doing an FFT that take O(n log2(n)) time. In the bit-RAM model (bit operations have unit cost), each operation in the FFT takes O(log n) time, so the whole FFT takes O(n log2(n)) time.

The magic in Schoenhage-Strassen is the clever recursive choice of rings over which the FFT is taken so that you don't get a further log(n) blowup, but only a log(log(n)) blowup. If you have k m-word ring elements (so that k*m = n), you only need O(m*k log(k)) time to do m FFT's in that ring. Then you have to compute k^2 related m-word convolutions if you do this, but that's cool; we can FFT (using a smaller ring) all k of the m-word ring elements in bulk, do the multiplication, and inverse-FFT back.

Upvotes: 3

Related Questions