p-value
p-value

Reputation: 648

Avoiding double for-loops in NumPy array operations

Suppose I have two 2D NumPy arrays A and B, I would like to compute the matrix C whose entries are C[i, j] = f(A[i, :], B[:, j]), where f is some function that takes two 1D arrays and returns a number.

For instance, if def f(x, y): return np.sum(x * y) then I would simply have C = np.dot(A, B). However, for a general function f, are there NumPy/SciPy utilities I could exploit that are more efficient than doing a double for-loop?

For example, take def f(x, y): return np.sum(x != y) / len(x), where x and y are not simply 0/1-bit vectors.

Upvotes: 0

Views: 775

Answers (1)

Till Hoffmann
Till Hoffmann

Reputation: 9877

Here is a reasonably general approach using broadcasting.

First, reshape your two matrices to be rank-four tensors.

A = A.reshape(A.shape + (1, 1))
B = B.reshape((1, 1) + B.shape)

Second, apply your function element by element without performing any reduction.

C = f(A, B)  # e.g. A != B

Having reshaped your matrices allows numpy to broadcast. The resulting tensor C has shape A.shape + B.shape.

Third, apply any desired reduction by, for example, summing over the indices you want to discard:

C = C.sum(axis=(1, 3)) / C.shape[0]

Upvotes: 1

Related Questions