Reputation: 806

Theoretical vs actual time-complexity for algorithm calculating 2^n

I am trying to compute the time-complexity and compare it with the actual computation times.

If I am not mistaken, the time-complexity is O(log(n)), but looking at the actual computation times it looks more like O(n) or even O(nlog(n)).

What could be reason for this difference?

def pow(n):
    """Return 2**n, where n is a nonnegative integer."""
    if n == 0:
        return 1
    x = pow(n//2)
    if n%2 == 0:
        return x*x
    return 2*x*x

Theoretical time-complexity:

Actual run times:

Upvotes: 11

Answers (7)

tevemadar

Reputation: 13195

You can generate textbook-like results if you count what they count, the steps taken:

def pow(n):
    global calls
    calls+=1
    """Return 2**n, where n is a nonnegative integer."""
    if n == 0:
        return 1
    x = pow(n//2)
    if n%2 == 0:
        return x*x
    return 2*x*x

def steppow(n):
    global calls
    calls=0
    pow(n)
    return calls

sx = [math.pow(10,n) for n in range(1,11)]
sy = [steppow(n)/math.log(n) for n in sx]
print(sy)

Then it produces something like this:

[2.1714724095162588, 1.737177927613007, 1.5924131003119235, 1.6286043071371943, 1.5634601348517065, 1.5200306866613815, 1.5510517210830421, 1.5200306866613813, 1.4959032154445342, 1.5200306866613813]

Where the 1.52... appears to be some kind of favourite.

But the actual runtime also includes the seemingly innocent mathematical operations, which grow in complexity too as the number physically grows in the memory. CPython uses a numer of multiplication implementations branching at various points:

long_mul is the entry:

if (Py_ABS(Py_SIZE(a)) <= 1 && Py_ABS(Py_SIZE(b)) <= 1) {
    stwodigits v = (stwodigits)(MEDIUM_VALUE(a)) * MEDIUM_VALUE(b);
    return PyLong_FromLongLong((long long)v);
}

z = k_mul(a, b);

if the numbers fit into a CPU word, they get multiplied in place (but the result may be larger, hence the LongLong (*)), otherwise they will use k_mul() which stands for Karatsuba multiplication, which also checks a couple things based on size and value:

i = a == b ? KARATSUBA_SQUARE_CUTOFF : KARATSUBA_CUTOFF;
if (asize <= i) {
    if (asize == 0)
        return (PyLongObject *)PyLong_FromLong(0);
    else
        return x_mul(a, b);
}

for shorter numbers, a classic algorithm is used, x_mul(), and shortness-check also depends on the product being a square, beause x_mul() has an optimized code path for calculating x*x-like expressions. However, above a certain in-memory-size, the algorithm stays locally, but then it does another check about how different the magnitude of the two values are different:

if (2 * asize <= bsize)
    return k_lopsided_mul(a, b);

possibly branching to yet another algorithm, k_lopsided_mul(), which is still Karatsuba, but optimized for multiplying numbers with significant difference in magnitude.

In short, even the 2*x*x has significance, if you replace it with x*x*2, timeit results will differ:

2*x*x: [0.00020009249478223623, 0.0002965123323532072, 0.00034258906889154733, 0.0024181753953639975, 0.03395215528201522, 0.4794894526936972, 4.802882867816082]
x*x*2: [0.00014974939375012042, 0.00020265231347948998, 0.00034002925019471775, 0.0024501731290706985, 0.03400164511014836, 0.462764023966729, 4.841786565730171]

(measured as

sx = [math.pow(10,n) for n in range(1,8)]
sy = [timeit.timeit('pow(%d)' % i, number=100, globals=globals()) for i in sx]

)

(*) by the way, as the size of the result is often overestimated (like at the very beginning, long*long may or may not fit into a long afterwards), there is a long_normalize function too, which at the end does spend time on freeing extra memory (see comment above it), but still sets the correct size for the internal object, which involves a loop counting zeroes in front of the actual number.

Upvotes: 0

imposeren

Reputation: 4392

For some integer values python will internally use "long reperesentation" And in your case this happens somewhere after n=63, so your theoretical time complexity should be correct only for values of n < 63.

For "long representation" multiplying 2 numbers (x * y) have complexity bigger than O(1):

for x == y (e.g. x*x) complexity is around O(Py_SIZE(x)² / 2).
for x != y (e.g. 2*x) multiplication is performed like "Schoolbook long multiplication", so complexity will be O(Py_SIZE(x)*Py_SIZE(y)). In your case it might a little affect performance too, because 2*x*x will do (2*x)*x, while faster way would be to do 2*(x*x)

And so for n>=63 theoretical complexity must also account for complexity of multiplications.

It's possible to measure "pure" complexity of custom pow (ignoring complexity of multiplication) if you can reduce complexity of multiplication to O(1). For example:

SQUARE_CACHE = {}
HALFS_CACHE = {}


def square_and_double(x, do_double=False):
    key = hash((x, do_double))
    if key not in SQUARE_CACHE:
        if do_double:
            SQUARE_CACHE[key] = 2 * square_and_double(x, False)
        else:
            SQUARE_CACHE[key] = x*x
    return SQUARE_CACHE[key]


def half_and_remainder(x):
    key = hash(x)
    if key not in HALFS_CACHE:
        HALFS_CACHE[key] = divmod(x, 2)
    return HALFS_CACHE[key]


def pow(n):
    """Return 2**n, where n is a non-negative integer."""
    if n == 0:
        return 1
    x = pow(n//2)
    return square_and_double(x, do_double=bool(n % 2 != 0))


def pow_alt(n):
    """Return 2**n, where n is a non-negative integer."""
    if n == 0:
        return 1
    half_n, remainder = half_and_remainder(n)
    x = pow_alt(half_n)
    return square_and_double(x, do_double=bool(remainder != 0))

import timeit
import math


# Values of n:
sx = sorted([int(x) for x in [100, 1000, 10e4, 10e5, 5e5, 10e6, 2e6, 5e6, 10e7, 10e8, 10e9]])

# Fill caches of `square_and_double` and `half_and_remainder` to ensure that complexity of both `x*x` and of `divmod(x, 2)` are O(1):
[pow_alt(n) for n in sx]

# Average runtime in ms:
sy = [timeit.timeit('pow_alt(%d)' % n, number=500, globals=globals())*1000 for n in sx]

# Theoretical values:
base = 2
sy_theory = [sy[0]]
t0 = sy[0] / (math.log(sx[0], base))
sy_theory.extend([
    t0*math.log(x, base)
    for x in sx[1:]
])
print("real timings:")
print(sy)
print("\ntheory timings:")
print(sy_theory)

print('\n\nt/t_prev:')
print("real:")
print(['--' if i == 0 else "%.2f" % (sy[i]/sy[i-1]) for i in range(len(sy))])
print("\ntheory:")
print(['--' if i == 0 else "%.2f" % (sy_theory[i]/sy_theory[i-1]) for i in range(len(sy_theory))])

# OUTPUT:
real timings:
[1.7171500003314577, 2.515988002414815, 4.5264500004122965, 4.929114998958539, 5.251838003459852, 5.606903003354091, 6.680275000690017, 6.948587004444562, 7.609975000377744, 8.97067000187235, 16.48820400441764]

theory timings:
[1.7171500003314577, 2.5757250004971866, 4.292875000828644, 4.892993172417281, 5.151450000994373, 5.409906829571465, 5.751568172583011, 6.010025001160103, 6.868600001325832, 7.727175001491561, 8.585750001657289]


t/t_prev:
real:
['--', '1.47', '1.80', '1.09', '1.07', '1.07', '1.19', '1.04', '1.10', '1.18', '1.84']

theory:
['--', '1.50', '1.67', '1.14', '1.05', '1.05', '1.06', '1.04', '1.14', '1.12', '1.11']

Results are still not perfect but close to theoretical O(log(n))

Upvotes: 1

Alain T.

Reputation: 42143

The number of iterations will be log(n,2) but each iteration needs to perform a multiplication between two numbers that are twice as large as the preceding iteration's.

The best multiplication algorithms for variable precision numbers perform in O(N * log(N) * log(log(N))) or O(N^log(3)) where N is the number of digits (bits or words) needed to represent the number. It would seem that the two complexities combine to produce execution times that are larger than O(log(n)) in practice.

The digit count of the two numbers at each iteration is 2^i. So the total time will be the sum of multiplication (x*x) complexities for numbers going through the log(n) iterations

To compute the function's time complexity based on the Schönhage–Strassen multiplications algorithm, we would need to add the time complexity of each iteration using : O(N * log(N) * log(log(N))):

∑ 2^i * log(2^i) * log(log(2^i)) [i = 0...log(n)]

∑ 2^i * i * log(i) [i = 0...log(n)]

which would be quite complex, so let's look at a simpler scenario.

If Python's variable precision multiplications used the most naive O(N^2) algorithm, the worst case time could be expressed as:

∑ (2^i)^2 [i = 0...log(n)]

∑ 4^i [i = 0...log(n)]    

(4^(log(n)+1)-1)/3    # because ∑K^i [i=0..n] = (K^(n+1)-1)/(K-1)

( 4*4^log(n) - 1 ) / 3

( 4*(2^log(n))^2 - 1 ) / 3    

(4*n^2-1)/3   # 2^log(n) = n

(4/3)*n^2-1/3

This would be O(n^2), which suggests that the log(n) iteration time cancels itself out in favour of the multiplication's complexity profile.

We get the same result if we apply this reasoning to the Karatsuba multiplication algorithm: O(N^log(3)):

∑ (2^i)^log(3)    [i=0..log(n)]

∑ (2^log(3))^i    [i=0..log(n)]

∑ 3^i             [i=0..log(n)]

( 3^(log(n)+1) - 1 ) / 2   # because ∑K^i [i=0..n] = (K^(n+1)-1)/(K-1)

( 3*3^log(n) - 1 ) / 2

( 3*(2^log(3))^log(n) - 1 ) / 2

( 3*(2^log(n))^log(3) - 1 ) / 2

(3/2)*n^log(3) - 1/2

which corresponds to O(n^log(3)) and corroborates the theory.

Note that the last column of your measurement table is misleading because you're making n progress exponentially. This changes the meaning of t[i]/t[i-1] and its interpretation for the evaluation of time complexity. It would be more meaningful if the progression between N[i] and N[i-1] was linear.
Taking into account the N[i]/N[i-1] ratio in the calculation, I found that the results seem to correlate more with O(n^log(3)) which would suggest that Python uses Karatsuba for large integer multiplications. (for version 3.7.1 on MacOS) However, this correlation is very weak.

FINAL ANSWER: O(log(N))

After doing more tests, I realized that there are wild variations in the time taken to multiply large numbers. Sometimes larger numbers take considerably less time than smaller ones. This makes the timing figures suspect and correlation to a time complexity based on a small and irregular sample is not going to be conclusive.
With a larger and more evenly distributed sample, the time strongly correlates (0.99) with log(N). This would mean the differences introduced by multiplication overhead only impact fixed points in the value range. Intentionally selecting values of N that are orders of magnitude apart exacerbated the impact of these fixed points thus skewing the results.

So you can ignore all the nice theories I wrote above, because the data shows that the time complexity is indeed Log(n). You just have to use a more meaningful sample (and better rate of change calculations).

Upvotes: 6

chepner

Reputation: 531345

You have to consider the true input size of the function. It's not the magnitude of n, but the number of bits needed to represent n, which is logarithmic in the magnitude. That is, dividing a number by 2 doesn't cut the input size in half: it only reduces it by 1 bit. This means that for an n-bit number (whose value is between 2^n and 2^(n+1)), the running time is indeed logarithmic in magnitude, but linear in the number of bits.

    n       lg n                 bits to represent n
    --------------------------------------
    10     between 2 and 3      4 (1010)
   100     between 4 and 5      7 (1100100)
  1000     just under  7       10 (1111101000)
 10000     between 9 and 10    14 (10011100010000)

Each time you multiply n by 10, you are only increasing the input size by 3-4 bits, roughly a factor of 2, not a factor of 10.

Upvotes: 1

knh190

Reputation: 2882

I was suspecting your time calculation is not accurate, so I did it using timeit, here're my stats:

import timeit
# N
sx = [10, 100, 1000, 10e4, 10e5, 5e5, 10e6, 2e6, 5e6]
# average runtime in seconds
sy = [timeit.timeit('pow(%d)' % i, number=100, globals=globals()) for i in sx]

Update:

Well, the code did run with O(n*log(n))...! A possible explanation is that multiplication / division is not O(1) for large numbers so this part doesn't hold:

T(n) = 1 + T(n//2)
     = 1 + 1 + T(n//4)
#      ^   ^
#     mul>1
#         div>1
# when n is large

Experiment with multiplication and division:

mul = lambda x: x*x
div = lambda y: x//2

s1 = [timeit.timeit('mul(%d)' % i, number=1000, globals=globals()) for i in sx]
s2 = [timeit.timeit('div(%d)' % i, number=1000, globals=globals()) for i in sx]

And plots, same for mul and div - they are not O(1) (?) small integers seem to be more efficient but no big difference for large integers. I don't know what could be the cause then. (though, I should keep the answer here if it can help)

Upvotes: 9

Евгений Попович

Reputation: 86

Its because multiply 2 small numbers its O(1). But multiply 2 long number (N - num)O(log(N)**2). https://en.wikipedia.org/wiki/Multiplication_algorithm So in each step time increase not O(log(N))

Upvotes: 4

Evan Gertis

Reputation: 2052

This can be complex, but there are different cases that you will have to examine for different values of n since this is recursive. This should explain it https://en.wikipedia.org/wiki/Master_theorem_(analysis_of_algorithms).

Upvotes: 1

Theoretical vs actual time-complexity for algorithm calculating 2^n

Answers (7)

Related Questions