Reputation: 140
I am writing a code IN Python to compute the discrete Laplacian as a sparse matrix in 2D. The code is as follows:
def discretizeLaplacian(N, step):
NN = N**2
dxx = (float) (1/((step)**2))
i_loc, j_loc, vals = np.array([]), np.array([]), np.array([])
for i in range(1, N-1):
for j in range(1, N-1):
iv0 = j*N + i
iv1 = (j - 1)*N + i
iv2 = (j + 1)*N + i
iv3 = j*N + i - 1
iv4 = j*N + i + 1
i_loc = np.concatenate((i_loc,[iv0, iv0, iv0, iv0, iv0]))
j_loc = np.concatenate((j_loc,[iv0, iv1, iv2, iv3, iv4]))
vals = np.concatenate((vals,dxx*np.array([-4, 1, 1, 1, 1])))
sparseMatrix = csc_matrix((vals, (i_loc, j_loc)), shape = (NN, NN))
return sparseMatrix
The for loop runs much longer compared to Matlab. I am wondering if I can write it as nested list comprehension to make it faster.
Upvotes: 2
Views: 136
Reputation: 50298
The slow performance comes from the bad complexity of the algorithm. Indeed, the complexity of the original code is O(N**4)
! This is due to np.concatenate
which creates a new array by copying the old one and adding a few items at the end of the new one. This means that O(N**2)
copies of 3 growing arrays are performed. In general, you should avoid np.concatenate
in a loop to make a growing array. You should use Python lists in that case.
Note that you can use np.tile
to repeat values of an array and pre-compute the constant dxx * np.array([-4, 1, 1, 1, 1])
.
Here is the corrected code:
def discretizeLaplacian(N, step):
NN = N**2
dxx = (float) (1/((step)**2))
tmp = dxx * np.array([-4, 1, 1, 1, 1])
i_loc, j_loc = [], []
for i in range(1, N-1):
for j in range(1, N-1):
iv0 = j*N + I
iv1 = (j - 1)*N + i
iv2 = (j + 1)*N + i
iv3 = j*N + i - 1
iv4 = j*N + i + 1
i_loc.extend([iv0, iv0, iv0, iv0, iv0])
j_loc.extend([iv0, iv1, iv2, iv3, iv4])
i_loc = np.array(i_loc, dtype=np.float64)
j_loc = np.array(j_loc, dtype=np.float64)
vals = np.tile(tmp, (N-2)**2)
sparseMatrix = csc_matrix((vals, (i_loc, j_loc)), shape = (NN, NN))
return sparseMatrix
Here are performance results on my machine with N=200:
Original code: 22.510 s
Corrected code: 0.068 s
While you can use list comprehensions to set i_loc
with [j*N+I for i in range(1, N-1) for j in range(1, N-1)]
, it is not easy for j_loc
due to the multiple items to append. You can build a list of list/tuple and then flatten it, but it makes the code less readable and not faster.
If this is not fast enough, you can use Numpy vectorized functions (such as np.meshgrid
, np.arange
, and basic math operations) to build i_loc
and j_loc
. Please consider reading the Numpy tutorials about that. Python loops are generally very slow since Python codes are generally interpreted. Matlab uses internally a just-in-time compiler (JIT) to execute such a loop-based code quickly. You can use Numba or Cython to do the same thing in Python.
Upvotes: 2