Reputation: 924
I want to avoid using for loop in the following code to achieve performance. Is vectorization suitable for this kind of problem?
a = np.array([[0,1,2,3,4],
[5,6,7,8,9],
[0,1,2,3,4],
[5,6,7,8,9],
[0,1,2,3,4]],dtype= np.float32)
temp_a = np.copy(a)
for i in range(1,a.shape[0]-1):
for j in range(1,a.shape[1]-1):
if a[i,j] > 3:
temp_a[i+1,j] += a[i,j] / 5.
temp_a[i-1,j] += a[i,j] / 5.
temp_a[i,j+1] += a[i,j] / 5.
temp_a[i,j-1] += a[i,j] / 5.
temp_a[i,j] -= a[i,j] * 4. / 5.
a = np.copy(temp_a)
Upvotes: 5
Views: 365
Reputation: 924
I came up with this solution:
a = np.array([[0,0,0,0,0],
[0,6,2,8,0],
[0,1,5,3,0],
[0,6,7,8,0],
[0,0,0,0,0]],dtype= np.float32)
up= np.zeros_like(a)
down= np.zeros_like(a)
right= np.zeros_like(a)
left = np.zeros_like(a)
def new_a(a,u,r,d,l):
c = np.copy(a)
c[c <= 3] = 0
up[:-2, 1:-1] += c[1:-1,1:-1] / 5.
down[2:, 1:-1] += c[1:-1,1:-1] / 5.
left[1:-1, :-2] += c[1:-1,1:-1]/ 5.
right[1:-1, 2:] += c[1:-1,1:-1] / 5.
a[1:-1,1:-1] -= c[1:-1,1:-1] * 4. / 5.
a += up + down + left + right
return a
Upvotes: 1
Reputation: 3077
You are basically doing convolution, with some special treatment for borders.
Try the following:
from scipy.signal import convolve2d
# define your filter
f = np.array([[0.0, 0.2, 0.0],
[0.2,-0.8, 0.2],
[0.0, 0.2, 0.0]])
# select parts of 'a' to be used for convolution
b = (a * (a > 3))[1:-1, 1:-1]
# convolve, padding with zeros ('same' mode)
c = convolve2d(b, f, mode='same')
# add the convolved result to 'a', excluding borders
a[1:-1, 1:-1] += c
# treat the special cases of the borders
a[0, 1:-1] += .2 * b[0, :]
a[-1, 1:-1] += .2 * b[-1, :]
a[1:-1, 0] += .2 * b[:, 0]
a[1:-1, -1] += .2 * b[:, -1]
It gives the following result, which is the same as you nested loops.
[[ 0. 2.2 3.4 4.6 4. ]
[ 6.2 2.6 4.2 3. 10.6]
[ 0. 3.4 4.8 6.2 4. ]
[ 6.2 2.6 4.2 3. 10.6]
[ 0. 2.2 3.4 4.6 4. ]]
Upvotes: 2
Reputation: 7994
My trail uses 3 filters, rot90
, np.where
, np.sum
, and np.multiply
. I am not sure which way to benchmark is more reasonable. If you do not take into account the time to create filters, it is roughly 4 times faster.
# Each filter basically does what `op` tries to achieve in a loop
filter1 = np.array([[0, 1 ,0, 0, 0],
[1, -4, 1, 0, 0],
[0, 1, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0]]) /5.
filter2 = np.array([[0, 0 ,1, 0, 0],
[0, 1, -4, 1, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0]]) /5.
filter3 = np.array([[0, 0 ,0, 0, 0],
[0, 0, 1, 0, 0],
[0, 1, -4, 1, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 0, 0]]) /5.
# only loop over the center of the matrix, a
center = np.array([[0, 0, 0, 0, 0],
[0, 1, 1, 1, 0],
[0, 1, 1, 1, 0],
[0, 1, 1, 1, 0],
[0, 0, 0, 0, 0]])
filter1
and filter2
can be rotated to represent 4 filters individually.
filter1_90_rot = np.rot90(filter1, k=1)
filter1_180_rot = np.rot90(filter1, k=2)
filter1_270_rot = np.rot90(filter1, k=3)
filter2_90_rot = np.rot90(filter2, k=1)
filter2_180_rot = np.rot90(filter2, k=2)
filter2_270_rot = np.rot90(filter2, k=3)
# Based on different index from `a` return different filter
filter_dict = {
(1,1): filter1,
(3,1): filter1_90_rot,
(3,3): filter1_180_rot,
(1,3): filter1_270_rot,
(1,2): filter2,
(2,1): filter2_90_rot,
(3,2): filter2_180_rot,
(2,3): filter2_270_rot,
(2,2): filter3
}
Main function
def get_new_a(a):
x, y = np.where(((a > 3) * center) > 0) # find pairs that match the condition
return a + np.sum(np.multiply(filter_dict[i, j], a[i,j])
for (i, j) in zip(x,y))
Note: There seem to be some numerical errors such that np.equal()
would mostly return False
between my result and OP's while np.close()
would return true.
def op():
temp_a = np.copy(a)
for i in range(1,a.shape[0]-1):
for j in range(1,a.shape[1]-1):
if a[i,j] > 3:
temp_a[i+1,j] += a[i,j] / 5.
temp_a[i-1,j] += a[i,j] / 5.
temp_a[i,j+1] += a[i,j] / 5.
temp_a[i,j-1] += a[i,j] / 5.
temp_a[i,j] -= a[i,j] * 4. / 5.
a2 = np.copy(temp_a)
%timeit op()
167 µs ± 2.72 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit get_new_a(a):
37.2 µs ± 2.68 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Note again, we ignore the time to create filter as I think it would be a one time thing. If you do want to include the time to create filters, it is roughly two times faster. You might think it is not fair becasue op's method contains two np.copy
. The bottleneck of op's method, I think, is the for loop.
numpy.multiply
do a elementwise multiplication between two matrix.
np.rot90
does rotation for us. k
is a parameter that you can decide how many times to rotate.
np.isclose
can use this function to check whether two matrices are close within some error that you can define.
Upvotes: 1