How to speed up numpy tensor*tensor operation

Question

I have a bottleneck in my code it is numpy 3d array multiplication by * operator with numpy 3d array.
I wanted to speed up this part of the program with numba @njit or @jit decorator, but it decreased performance in 2 times.
Slow part of the code:

@numba.jit
def mat_mul_and_sum(img1, img2, alpha):
    return img1*(1-alpha) + img2*alpha

img1, img2, and alpha are 3d np.array with the same shapes.
Is it possible to speed up this line of code?

JoshAdel · Accepted Answer

If you unroll the loops as follows, for an array size of (100, 100, 100), numba is twice as fast as the pure numpy version, likely due to the fact that no intermediate arrays need to be allocated:

import numpy as np
import numba as nb

def mat_mul_and_sum(img1, img2, alpha):
    return img1*(1-alpha) + img2*alpha


@nb.jit
def mat_mul_and_sum2(img1, img2, alpha):
    NI, NJ, NK = img1.shape
    out = np.empty((NI, NJ, NK))

    for i in range(NI):
        for j in range(NJ):
            for k in range(NK):
                out[i,j,k] = img1[i,j,k] * (1.0 - alpha[i,j,k]) + img2[i,j,k] * alpha[i,j,k]

    return out

and then testing:

N = 100
img1 = np.random.normal(size=(N, N, N))
img2 = np.random.normal(size=(N, N, N))
alpha = np.random.normal(size=(N, N, N))

A = mat_mul_and_sum(img1, img2, alpha)
B = mat_mul_and_sum2(img1, img2, alpha)

np.allclose(A,B) #True

%timeit mat_mul_and_sum(img1, img2, alpha)
# 4.6 ms ± 44.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit mat_mul_and_sum2(img1, img2, alpha)
# 2.4 ms ± 10.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Update: You can also try changing the decorator to nb.jit(parallel=True) and then replace the outer loop with for i in nb.prange(NI):, which on my machine drops the results from timeit to 1.37 ms. This and other timings will certainly differ from machine to machine and also on the size of the inputs.

How to speed up numpy tensor*tensor operation

Answers (2)

Related Questions