Fast method for summation over multidimensional arrays with indirect indexing in numpy

Question

What is the optimal (fastest) way to compute the following expression:

\sum_{i \in I} \alpha_i \sum_{j \in J} \beta_j M[:, i, j] for given numpy arrays:

I, J containing indices;
alpha and beta of shapes (1, |I|) and (1, |J|) containing coefficients;
M of ndim=3.

In a more general case, I need to compute it for a lot of alpha, beta, I, J inputs with the same M array. So it may be considered that alphas nad Is have shape (N, 4), betas and Js have shape (N, 3) and I need to compute this expression for each n in range(N).

Thank you in advance.

Based on some comments, to make the question more clear and to add some context, here is a naive approach to the problem with realistic sizes:

M has shape (500, 200000, 20)

I has shape (10^6, 4)

J has shape (10^6, 3)

alpha has shape (10^6, 4)

beta has shape (10^6, 3)

N = 10**6
M_new = np.zeros(M.shape[0], N)
for n in range(N):
    for i in range(4):
        for j in range(3):
            M_new[:, n] += alpha[n, i] * beta[n, j] * M[:, I[n, i], J[n, j]]

So the question is how to compute M_new as fast as possible.

Solutions

So far the fastest solution is the one proposed by @jdehesa using Numba.

@Han-KwangNienhuys presented speed comparison with alternative methods.

javidcf · Accepted Answer

EDIT:
I misunderstood the question in the first place. I left the original answer below but it does not do what the question asks.

At the risk of sounding obvious, you can always resort to Numba:

import numpy as np
import numba as nb

# Original loop implementation
def comb_loop(m, ii, jj, alpha, beta):
    n = ii.shape[0]
    m_new = np.zeros((m.shape[0], n))
    for col in range(n):
        for i in range(4):
            for j in range(3):
                m_new[:, col] += alpha[col, i] * beta[col, j] * m[:, ii[col, i], jj[col, j]]
    return m_new

# Numba implementation
@nb.njit(parallel=True)
def comb_nb(m, ii, jj, alpha, beta):
    n = ii.shape[0]
    m_new = np.empty((m.shape[0], n), m.dtype)
    for col in nb.prange(n):
        for row in range(m.shape[0]):
            val = 0
            for i in range(4):
                for j in range(3):
                    val += alpha[col, i] * beta[col, j] * m[row, ii[col, i], jj[col, j]]
            m_new[row, col] = val
    return m_new


# Test
np.random.seed(0)
N = 1_000  # Reduced for testing
m = np.random.rand(500, 200_000, 20)
ii = np.random.randint(m.shape[1], size=(N, 4))
jj = np.random.randint(m.shape[2], size=(N, 3))
alpha = np.random.rand(N, 4)
beta = np.random.rand(N, 3)

# Check results match
print(np.allclose(comb_loop(m, ii, jj, alpha, beta), comb_nb(m, ii, jj, alpha, beta)))
# True

# Timings
%timeit comb_loop(m, ii, jj, alpha, beta)
# 181 ms ± 1.77 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit comb_nb(m, ii, jj, alpha, beta)
# 31.1 ms ± 2.03 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

ORIGINAL WRONG ANSWER

You can use np.einsum:

import numpy as np

def comb(alpha, beta, m):
    return np.einsum('i,j,nij->n', alpha, beta, m)

# Test
np.random.seed(0)
alpha = np.random.rand(10)
beta = np.random.rand(20)
m = np.random.rand(30, 10, 20)
result = comb(alpha, beta, m)
print(result.shape)
# (30,)

Fast method for summation over multidimensional arrays with indirect indexing in numpy

Answers (2)

Speed comparison

Related Questions