Reputation: 29
I'm currently in an algorithms class and was interested to see which of two methods of multiplying a list of large numbers gives the faster runtime. What I found was that the recursive multiply performs about 10x faster. For the code below, I got t_sim=53.05s and t_rec=4.73s. I did some other tests and they all seemed to be around the 10x range.
Additionally, you could put the values from the recursive multiply into a tree and reuse them to even more quickly compute multiplications of subsets of the list.
I did a theoretical runtime analysis, and both are n^2 using standard multiplication, but when the karatsuba algorithm is used, that factor goes down to n^log_2(3).
Every multiply in simple_multiply should have runtime i * 1. Summing over i=1...n, we get an arithmetic series and can use gauss's formula to get n*(n+1)/2 = O(n^2).
For the second one, we can see that the time to multiply for a given level of recursion is (2^d)^2, where d is the depth, but only needs to multiply n*2^-d values. The levels turn out to form a geometric series where the runtime at each level is n*2^d with a final depth of log_2(n). The solution to the geometric series is n * (1-2^log_2(n))/(1-2) = n*(n-1) = O(n^2). If using the karatsuba algorithm, you can get O(n^log_2(3)) by doing the same method
If the code were using the karatsuba algorithm, then the speedup would make sense, but what doesn't seem to make sense is the linear relationship between the two runtimes, making it seem like python is using standard multiplication, which according to wikipedia is faster when using under 500ish bits. (I'm using 2^23 bits in the code below. Each number is literally a megabyte long)
import random
import time
def simple_multiply(values):
a = 1
for val in values:
a *= val
return a
def recursive_multiply(values):
if len(values) == 1:
return values[0]
temp = []
i = 0
while i + 1 < len(values):
temp.append(values[i] * values[i+1])
i += 2
if len(values) % 2 == 1:
temp.append(values[-1])
return recursive_multiply(temp)
def test(func, values):
t1 = time.time()
func(values)
print( time.time() - t1)
def main():
n = 2**11
scale = 2**12
values = [random.getrandbits(scale) for i in range(n)]
test(simple_multiply, values)
test(recursive_multiply, values)
pass
if __name__ == '__main__':
main()
Upvotes: 2
Views: 133
Reputation: 5958
There is something wrong in your assumption. Your assumption is that each multiplication of the between the different ranks should take same times, for instance len(24)*len(72)
approx len(48)*len(48)
. But that's not true, as evident by the following snippets:
%%timeit
random.getrandbits(2**14)*random.getrandbits(2**14)*random.getrandbits(2**14)*random.getrandbits(2**14)
>>>1000 loops, best of 3: 1.48 ms per loop
%%timeit
(random.getrandbits(2**14)*random.getrandbits(2**14))*(random.getrandbits(2**14)*random.getrandbits(2**14))
>>>1000 loops, best of 3: 1.23 ms per loop
The difference is consistent even on such a small scale
Upvotes: 0
Reputation: 59263
Both versions of the code have the same number of multiplications, but in the simple version each multiplication is ~2000 bits long on average.
In the second version n/2 multiplications are 24 bits long, n/4 are 48 bits long, n/8 are 96 bits long, etc... The average length is only 48 bits.
Upvotes: 2