Reputation: 19
I have A
, which is a very large quadratic numpy
matrix of size n
in upper triangular form with non-negative entries above the diagonal.
How can I improve the performance of the following nested for-loops as much as possible:
import numpy as np
A = np.array([[1,2,3],[0,1,6],[0,0,1]],float)
n = len(A)
for k in range(1,n-1):
for i in zip(*np.where(A[:,k]>0)):
for j in zip(*np.where(A[k,:]>0)):
if (A[i,j] < A[i,k]*A[k,j]):
A[i,j]=A[i,k]*A[k,j]
Also not using for-loops at all would be fine for me, if it is possible.
Upvotes: 1
Views: 181
Reputation: 6492
In this example I use Numba
, but a Cython
solution would also be possible.
Creating some data
import numpy as np
import numba as nb
A=np.random.rand(200*200).reshape(200,200)
A*=2
A=np.triu(A, k=0)
Code
Keep the code as simple as possible (avoid things like zip,itertools, list comprehensions, lists in general...)
@nb.njit()
def calc(A):
n = len(A)
for k in range(1,n-1):
for i in range(n):
if A[i,k]>0:
for j in range(n):
if A[k,j]>0:
val=A[i,k]*A[k,j]
if (A[i,j] < val):
A[i,j]=val
return A
Performance
non-compiled: 14.7 s
compiled : 3.3 ms (the first call has an overhead of about 0.5s due to compilation)
Upvotes: 1
Reputation: 16515
Well, one obvious small improvement are the lines:
if A[i, j] < A[i, k] * A[k, j]:
A[i, j] = A[i, k] * A[k, j]
which can be improved to this:
aux = A[i, k] * A[k, j]
if A[i, j] < aux:
A[i, j] = aux
Testing the two versions:
import numpy as np
import timeit
def f1(use_float, size):
if use_float:
A = np.random.rand(size, size)
else:
A = np.random.random_integers(0, 100, (size, size))
n = len(A)
for k in range(1, n-1):
for i in zip(*np.where(A[:, k] > 0)):
for j in zip(*np.where(A[k, :] > 0)):
if A[i, j] < A[i, k] * A[k, j]:
A[i, j] = A[i, k] * A[k, j]
def f2(use_float, size):
if use_float:
A = np.random.rand(size, size)
else:
A = np.random.random_integers(0, 100, (size, size))
for k in range(1, len(A) - 1):
for i in zip(*np.where(A[:, k] > 0)):
for j in zip(*np.where(A[k, :] > 0)):
aux = A[i, k] * A[k, j]
if A[i, j] < aux:
A[i, j] = aux
if __name__ == '__main__':
setup = 'from __main__ import f{f} as f'
statement = 'f({use_float}, {size})'
number = 1000
for use_float in (False, True):
for size in [5, 50, 500, 5000, 50000]:
statement = statement.format(use_float=use_float, size=size)
t1 = timeit.timeit(statement, setup.format(f=1), number=number)
t2 = timeit.timeit(statement, setup.format(f=2), number=number)
print('type {:5s} | size {:6d} | t1 {:8.4f} | t2 {:8.4f} | new vs old time {:5.2f} %'.format(
'float' if use_float else 'int',
size, t1, t2, t2 * 100 / t1))
This gives us the following results:
type int | size 5 | t1 0.9130 | t2 0.7245 | new vs old time 79.35 %
type int | size 50 | t1 0.9120 | t2 0.7278 | new vs old time 79.80 %
type int | size 500 | t1 0.9048 | t2 0.7345 | new vs old time 81.17 %
type int | size 5000 | t1 0.9340 | t2 0.7247 | new vs old time 77.59 %
type int | size 50000 | t1 0.9148 | t2 0.7408 | new vs old time 80.99 %
type float | size 5 | t1 0.9141 | t2 0.7373 | new vs old time 80.66 %
type float | size 50 | t1 0.9212 | t2 0.7438 | new vs old time 80.74 %
type float | size 500 | t1 0.9481 | t2 0.7383 | new vs old time 77.86 %
type float | size 5000 | t1 0.9332 | t2 0.7393 | new vs old time 79.22 %
type float | size 50000 | t1 0.9267 | t2 0.7450 | new vs old time 80.39 %
showing an improvement of about 20%
, which in my opinion is quite meaningful if you are looking to optimize.
Upvotes: 1