Accelerate numpy calculation

Question

I'm writing a Python function to integrate a vector field on a high-dimensional matrix space. A, shape (n, m), is a matrix whose time derivative is linear in each of its components A[i, j]. We can collect all of the coefficients of the derivative into a 4D array C such that C[i, j, k, l] is the coefficient of A[k, l] in the derivative of A[i, j]. In this case, the derivative of A is given by dA[i, j] == (C[i, j] * A).sum(). Thus it is correct to compute

dA = np.array([[ (Cij * A).sum() for Cij in Ci ] for Ci in C ])

Fortunately, C can be represented as a sparse.COO object so that the above requires only O(nm) multiplications. But the two for loops are still slow. Thanks to a helpful comment I improved this to

dA = (C * A).sum(axis=3).sum(axis=2)

leveraging broadcasting for a significant speedup. Can anyone go faster?

Accelerate numpy calculation

Answers (1)

Related Questions