Reputation: 924

Vectorization to achieve performance

I want to avoid using for loop in the following code to achieve performance. Is vectorization suitable for this kind of problem?

a = np.array([[0,1,2,3,4],
              [5,6,7,8,9],
              [0,1,2,3,4],
              [5,6,7,8,9],
              [0,1,2,3,4]],dtype= np.float32)

temp_a = np.copy(a)

for i in range(1,a.shape[0]-1):
    for j in range(1,a.shape[1]-1):
        if a[i,j] > 3:
            temp_a[i+1,j] += a[i,j] / 5.
            temp_a[i-1,j] += a[i,j] / 5.
            temp_a[i,j+1] += a[i,j] / 5.
            temp_a[i,j-1] += a[i,j] / 5.
            temp_a[i,j]   -= a[i,j] * 4. / 5.
a = np.copy(temp_a)

Upvotes: 5

Answers (3)

Behzad Jamali

Reputation: 924

I came up with this solution:

a = np.array([[0,0,0,0,0],
              [0,6,2,8,0],
              [0,1,5,3,0],
              [0,6,7,8,0],
              [0,0,0,0,0]],dtype= np.float32)

up= np.zeros_like(a)
down= np.zeros_like(a)
right= np.zeros_like(a)
left = np.zeros_like(a)

def new_a(a,u,r,d,l):

    c = np.copy(a)
    c[c <= 3] = 0

    up[:-2, 1:-1] += c[1:-1,1:-1] / 5.
    down[2:, 1:-1] += c[1:-1,1:-1] / 5.
    left[1:-1, :-2] += c[1:-1,1:-1]/ 5.
    right[1:-1, 2:] += c[1:-1,1:-1] / 5.
    a[1:-1,1:-1] -= c[1:-1,1:-1] * 4. / 5.
    a += up + down + left + right

    return a

Upvotes: 1

grovina

Reputation: 3077

You are basically doing convolution, with some special treatment for borders.

Try the following:

from scipy.signal import convolve2d


# define your filter
f = np.array([[0.0, 0.2, 0.0],
              [0.2,-0.8, 0.2],
              [0.0, 0.2, 0.0]])

# select parts of 'a' to be used for convolution
b = (a * (a > 3))[1:-1, 1:-1]

# convolve, padding with zeros ('same' mode)
c = convolve2d(b, f, mode='same')

# add the convolved result to 'a', excluding borders
a[1:-1, 1:-1] += c

# treat the special cases of the borders
a[0, 1:-1] += .2 * b[0, :]
a[-1, 1:-1] += .2 * b[-1, :]
a[1:-1, 0] += .2 * b[:, 0]
a[1:-1, -1] += .2 * b[:, -1]

It gives the following result, which is the same as you nested loops.

[[  0.    2.2   3.4   4.6   4. ]
 [  6.2   2.6   4.2   3.   10.6]
 [  0.    3.4   4.8   6.2   4. ]
 [  6.2   2.6   4.2   3.   10.6]
 [  0.    2.2   3.4   4.6   4. ]]

Upvotes: 2

Tai

Reputation: 7994

My trail uses 3 filters, rot90, np.where, np.sum, and np.multiply. I am not sure which way to benchmark is more reasonable. If you do not take into account the time to create filters, it is roughly 4 times faster.

# Each filter basically does what `op` tries to achieve in a loop

filter1 = np.array([[0, 1 ,0, 0, 0],
                  [1, -4, 1, 0, 0],
                  [0, 1, 0, 0, 0],
                  [0, 0, 0, 0, 0],
                  [0, 0, 0, 0, 0]]) /5.

filter2 = np.array([[0, 0 ,1, 0, 0],
                  [0, 1, -4, 1, 0],
                  [0, 0, 1, 0, 0],
                  [0, 0, 0, 0, 0],
                  [0, 0, 0, 0, 0]]) /5.
filter3 = np.array([[0, 0 ,0, 0, 0],
                  [0, 0, 1, 0, 0],
                  [0, 1, -4, 1, 0],
                  [0, 0, 1, 0, 0],
                  [0, 0, 0, 0, 0]]) /5.
# only loop over the center of the matrix, a
center = np.array([[0, 0, 0, 0, 0],
                   [0, 1, 1, 1, 0],
                   [0, 1, 1, 1, 0],
                   [0, 1, 1, 1, 0],
                   [0, 0, 0, 0, 0]])

filter1 and filter2 can be rotated to represent 4 filters individually.

filter1_90_rot = np.rot90(filter1, k=1)
filter1_180_rot = np.rot90(filter1, k=2)
filter1_270_rot = np.rot90(filter1, k=3)
filter2_90_rot = np.rot90(filter2, k=1)
filter2_180_rot = np.rot90(filter2, k=2)
filter2_270_rot = np.rot90(filter2, k=3)

# Based on different index from `a` return different filter

filter_dict = {
             (1,1): filter1,
             (3,1): filter1_90_rot,
             (3,3): filter1_180_rot,
             (1,3): filter1_270_rot,
             (1,2): filter2,
             (2,1): filter2_90_rot,
             (3,2): filter2_180_rot,
             (2,3): filter2_270_rot,
             (2,2): filter3
            }

Main function

def get_new_a(a):
    x, y = np.where(((a > 3) * center) > 0) # find pairs that match the condition
    return a + np.sum(np.multiply(filter_dict[i, j], a[i,j])
                      for (i, j) in zip(x,y))

Note: There seem to be some numerical errors such that np.equal() would mostly return False between my result and OP's while np.close() would return true.

Timing results

def op():
    temp_a = np.copy(a)

    for i in range(1,a.shape[0]-1):
        for j in range(1,a.shape[1]-1):
            if a[i,j] > 3:
                temp_a[i+1,j] += a[i,j] / 5.
                temp_a[i-1,j] += a[i,j] / 5.
                temp_a[i,j+1] += a[i,j] / 5.
                temp_a[i,j-1] += a[i,j] / 5.
                temp_a[i,j]   -= a[i,j] * 4. / 5.
    a2 = np.copy(temp_a)   

%timeit op()
167 µs ± 2.72 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

%timeit get_new_a(a):
37.2 µs ± 2.68 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Note again, we ignore the time to create filter as I think it would be a one time thing. If you do want to include the time to create filters, it is roughly two times faster. You might think it is not fair becasue op's method contains two np.copy. The bottleneck of op's method, I think, is the for loop.

Reference:

numpy.multiply do a elementwise multiplication between two matrix.
np.rot90 does rotation for us. k is a parameter that you can decide how many times to rotate. np.isclose can use this function to check whether two matrices are close within some error that you can define.

Upvotes: 1

Vectorization to achieve performance

Answers (3)

Timing results

Reference:

Related Questions