Reputation: 1
I'm trying to make an Abelian Sandpile using numpy arrays in python. The calculation speed is okay for smaller square matrices, but for larger ones, it slows down significantly (200x200 matrix, with 20000 initial sand particles taking upto 20-30 minutes). Is there any way to speed it up / optimize the matrix calculation? The threshold value is 3.
The basic code right now is -
import numpy as np
n = 200
size = (n,n)
x = np.zeros(size)
m = 0 # mean
if n%2 == 0:
m = int((n+1)/2)
else :
m = int(n/2)
x[m][m] = 100000
z = int(x[m][m])
def f(x):
count = 0
for i in range(0,n):
for j in range(0,n):
if x[i][j] > 3:
x[i][j] = x[i][j] - 4
if i-1 >= 0 :
x[i-1][j] = x[i-1][j] + 1
if i+1 < n :
x[i+1][j] = x[i+1][j] + 1
if j-1 >= 0 :
x[i][j-1] = x[i][j-1] + 1
if j+1 < n :
x[i][j+1] = x[i][j+1] + 1
elif x[i][j] <= 3:
count = count + 1
return x, count
for k in range(0,z):
y, count = f(x)
if count == n**2 :
break
elif count < n**2:
continue
print(y)
I've tried running a 500x500 matrix, with 100,000 initial particles, but that took more than 6 hours.
Upvotes: 0
Views: 266
Reputation: 114310
While vectorising with numpy will not necessarily cut down your algorithmic complexity, it will probably cut down your overhead by a factor of at least a few dozen. As a general rule of thumb, if you find yourself writing explicit loops or using explicit if
statements, you should consider rethinking your approach.
What may help you here is simple masking to implement the toppling. If you have the toppling sites marked with a 1 in a mask of the same shape as x
, you can directly subtract off the toppled piles and add the distributed sand just by shifting the mask:
mask = (x >= 4)
x[mask] -= 4
x[:, :-1] += mask[:, 1:] # topple left
x[:, 1:] += mask[:, :-1] # topple right
x[:-1, :] += mask[1:, :] # topple up
x[1:, :] += mask[:-1, :] # topple down
If count
is just the number of untopppled sites, you can use np.count_nonzero
to obtain it from the mask:
count = np.count_nonzero(mask)
If, on the other hand, you are using count
to determine when to stop your loop, you might find it easier to switch to counting how many topple sites there are:
count = np.sum(mask)
The outer loop terminates when this version of count reaches zero (or the original version reaches x.size
).
Upvotes: 0
Reputation: 5408
You can use numba for this purpose (you can add nopython=True or use static types for more speedup):
from numba import jit
import numpy as np
n = 200
size = (n,n)
x = np.zeros(size)
m = 0 # mean
if n%2 == 0:
m = int((n+1)/2)
else :
m = int(n/2)
x[m][m] = 100000
z = int(x[m][m])
def f(x):
count = 0
for i in range(0,n):
for j in range(0,n):
if x[i][j] > 3:
x[i][j] = x[i][j] - 4
if i-1 >= 0 :
x[i-1][j] = x[i-1][j] + 1
if i+1 < n :
x[i+1][j] = x[i+1][j] + 1
if j-1 >= 0 :
x[i][j-1] = x[i][j-1] + 1
if j+1 < n :
x[i][j+1] = x[i][j+1] + 1
elif x[i][j] <= 3:
count = count + 1
return x, count
@jit
def f_jit(x):
count = 0
for i in range(0,n):
for j in range(0,n):
if x[i][j] > 3:
x[i][j] = x[i][j] - 4
if i-1 >= 0 :
x[i-1][j] = x[i-1][j] + 1
if i+1 < n :
x[i+1][j] = x[i+1][j] + 1
if j-1 >= 0 :
x[i][j-1] = x[i][j-1] + 1
if j+1 < n :
x[i][j+1] = x[i][j+1] + 1
elif x[i][j] <= 3:
count = count + 1
return x, count
%%timeit
f(x)
28.7 ms ± 602 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%%timeit
f_jit(x)
59.9 µs ± 7.22 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Upvotes: 1