Reputation: 11
I'm trying to figure out how to multpiply matrixes really fast on Python without using NumPy. By this reason, I've recreated the Strassen algorithm and compared it with the standard multiplication of loops. Also, I compare only square matrices of size NxN where N is 2^k. Surprisingly, my Strassen algoritm is 3.5 times slower than standart one (I use from time import perf_counter() for tracking time). As an example, the average values are approximately as follows:
matrix size | Strassen | standart mult |
---|---|---|
16x16 | 0.006 sec | 0.002 sec |
32x32 | 0.036 sec | 0.013 sec |
64x64 | 0.26 sec | 0.07 sec |
128x128 | 1.69 sec | 0.49 sec |
1024x1024 | 771.42 sec | 221.09 sec |
I generated matrix elements with randint(1, 9) but anyway I tracked time only after creating matrixes for testing and stopped doing it before writing matrix result using loops. So I only tracked time of my functions. And yeah I've seen some other posts with the same problem like this one:
Why is Strassen matrix multiplication so much slower than standard matrix multiplication?
but actually I can't say that it was pretty helpful for me. Also, It would be really cool just to change smth in my current algorithm for optimizing instead of creating new one.
Stressen algorithm
t_start = perf_counter()
def submatrices(n, matrix): # dividing matrix for pieces
A = [[j for j in matrix[i][:int(n / 2)]] for i in range(int(n / 2))]
B = [[j for j in matrix[i][int(n / 2):]] for i in range(int(n / 2))]
C = [[j for j in matrix[i][:int(n / 2)]] for i in range(int(n / 2), n)]
D = [[j for j in matrix[i][int(n / 2):]] for i in range(int(n / 2), n)]
return [A, B, C, D]
def addition(n, matrix1, matrix2): # just addition
res = [[matrix1[i][j] + matrix2[i][j] for j in range(n)] for i in range(n)]
return res
def subtraction(n, matrix1, matrix2): # just substraction
res = [[matrix1[i][j] - matrix2[i][j] for j in range(n)] for i in range(n)]
return res
def strassen(n, matrix1, matrix2):
if n == 2: # the last step of algorithm is just standart multiplycation
xy = [[0] * n for i in range(n)]
for i in range(n):
for j in range(n):
for x in range(n):
xy[i][j] += matrix1[i][x] * matrix2[x][j]
else:
A, B, C, D = submatrices(n, matrix1) # divide the original matrix1
E, F, G, H = submatrices(n, matrix2) # divide the original matrix2
n = int(n / 2) # the matrix size is changed now
p1 = strassen(n, A, subtraction(n, F, H))
p2 = strassen(n, addition(n, A, B), H)
p3 = strassen(n, addition(n, C, D), E)
p4 = strassen(n, D, subtraction(n, G, E))
p5 = strassen(n, addition(n, A, D), addition(n, E, H))
p6 = strassen(n, subtraction(n, B, D), addition(n, G, H))
p7 = strassen(n, subtraction(n, A, C), addition(n, E, F))
xy1 = addition(n, addition(n, p5, p6), subtraction(n, p4, p2)) # making new blocks of matrix
xy2 = addition(n, p1, p2)
xy3 = addition(n, p3, p4)
xy4 = subtraction(n, addition(n, p1, p5), addition(n, p3, p7))
xy = [xy1[i] + xy2[i] for i in range(n)] + [xy3[i] + xy4[i] for i in range(n)] # assembling a matrix of blocks
return xy
print(f'Time: {perf_counter() - t_start} sec')
# printing result
for raw in strassen(n, matrix1, matrix2):
print(*raw)
Standart multiplication
t_start = perf_counter()
def multiply(n, matrix1, matrix2):
res = [[0]*n for i in range(n)]
for i in range(n):
for j in range(n):
for x in range(n):
res[i][j] += matrix1[i][x] * matrix2[x][j]
return res
print(f'Time: {perf_counter() - t_start} sec')
for raw in multiply(n, matrix1, matrix2):
print(*raw)
Upvotes: 1
Views: 994
Reputation: 50816
Both are horribly inefficient (certainly at least 3-4 order of magnitude slower than optimized implementations). The standard implementation of Python is the CPython interpreter which is clearly not design for such a kind of computation. It is mainly meant to execute glue code calling C functions like the one of BLAS libraries. In practice, accessing a list with lst[i][j]
cause many functions calls, memory indirections, object allocation/destruction, etc. All these overheads are pretty huge compare to the same since in a native compiled language (like C/C++) and they are also hard to track without a good understanding of the interpreter (another Python implementation will certainly result in completely different results). One issue with the Strassen implementation is recursion: recursion is pretty slow with CPython (you can write a naive Fibonacci recursive implementation so to easily check that). Another big issue is the creation of the many temporary sub list of lists which should be very slow too (a lot of objects need to be allocated).
If you want to compare such computationally intensive algorithm, please use a natively compiled language. The outcome of such a comparison in Python will be irrelevant in other languages and very dependent of the details of the target implementation of Python. Note that Strassen tends to be efficient only for huge matrices (generally much bigger than 1024x1024).
Besides, a first step could be to use Numpy so to create views of sub-matrices with a much smaller overhead. The operation on matrice should also be far faster and even also simpler to implement.
Upvotes: 1