Yongxin
Yongxin

Reputation: 173

Vectorized computation for arrays with different dimensions in Python

In scientific computation, a 3D field can be discretized as F[nx, ny, nz], where nx, ny and nz are the numbers of grid points in 3 directions. In every point, assume we have n-by-n tensor attached. So for the tensor field, we can use a 5D array to represent T[n, n, nx, ny, nz]. The tensor for any point [i, j, k] can be selected as T[:, :, i, j, k]. If I want to compute sum of off-diagonal elements for each point, I would like to use code

import numpy as np
r = np.zeros((nx, ny, nz)) 
for i in range(nx):
    for j in range(ny):
        for k in range(nz):
            r[i,j,k] = np.sum(T[:,:,i,j,k])-np.trace(T[:,:,i,j,k])

The result array r and tensor field T have different dimensions. The calculation of loop over every element is low efficient in Python. Is there any other way to do vectorized or efficient computation for arrays with different dimensions. Or what else data type/structure can be used.

Upvotes: 3

Views: 397

Answers (1)

unutbu
unutbu

Reputation: 879093

Below are two different alternatives. The first uses ndarray.sum and NumPy integer array indexing. The second alternative uses np.einsum.

def using_sum(T):
    total = T.sum(axis=1).sum(axis=0)
    m = np.arange(T.shape[0])
    trace = T[m, m].sum(axis=0)
    return total - trace

def using_einsum(T):
    return np.einsum('mnijk->ijk', T) - np.einsum('nnijk->ijk', T)

The first argument of np.einsum specifies the subscripts of the summation.

'mnijk->ijk' indicates that T has subcripts mnijk and only the ijk subscripts remain after summation. Therefore the summation is performed over the m and n subscripts. This makes np.einsum('mnijk->ijk', T)[i,j,k] equal to np.sum(T[:,:,i,j,k]), but computes the entire array in one vectorized computation.

Similarly, 'nnijk->ijk' tells np.einsum that T has subscripts nnijk and again only the ijk subscripts survive summation. Therefore the summation is over n. Since n is repeated, the summation over n computes the trace.

I like np.einsum because it communicates the intent of the computation succinctly. But it's also good to be familiar with how using_sum works since it uses fundamental NumPy operations. It's a good example of how nested loops can be avoided by using NumPy methods which operate on whole arrays.


Here's a perfplot comparing the performance of orig versus using_sum and using_einsum as a function of n, where T is taken to have shape (10, 10, n, n, n):

import perfplot
import numpy as np

def orig(T):
    _, _, nx, ny, nz = T.shape
    r = np.zeros((nx, ny, nz)) 
    for i in range(nx):
        for j in range(ny):
            for k in range(nz):
                r[i,j,k] = np.sum(T[:,:,i,j,k])-np.trace(T[:,:,i,j,k])
    return r

def using_einsum(T):
    r = np.einsum('mnijk->ijk', T) - np.einsum('nnijk->ijk', T)
    return r

def using_sum(T):
    total = T.sum(axis=1).sum(axis=0)
    m = np.arange(T.shape[0])
    trace = T[m,m].sum(axis=0)
    return total - trace

def make_T(n):
    return np.random.random((10,10,n,n,n))

perfplot.show(
    setup=make_T,
    kernels=[orig, using_sum, using_einsum],
    n_range=range(2, 80, 3),
    xlabel='n')

enter image description here

perfplot.show also checks that the values returned by orig, using_sum and using_einsum are equal.

Upvotes: 4

Related Questions