p-value
p-value

Reputation: 648

Using views for NumPy broadcasting

Suppose I have a 2D np.array X, and I need to use X[:, None, :] in some intermediate computations; for instance np.sum(X[:, None, :] == Y[None, :, :], axis=2) where Y is also a 2D np.array.

Does this operation explicitly copies the memory for X and Y to create X[:, None, :] and Y[:, None, :]? If so, is there a way to avoid this copying by using views in NumPy?

Upvotes: 1

Views: 91

Answers (1)

user2357112
user2357112

Reputation: 281683

X[:, None, :] and Y[None, :, :] are already views. Both operations are NumPy basic slicing, which always generates a view.

X[:, None, :] == Y[None, :, :] is going to be a much bigger memory problem, as it creates a very large boolean array. You can avoid this by rewriting your computation in terms of scipy.spatial.distance.cdist in 'hamming' mode:

In [10]: x
Out[10]: 
array([[3, 0, 2, 2, 3],
       [3, 2, 1, 3, 2],
       [2, 2, 1, 1, 1]])
In [11]: y
Out[11]: 
array([[0, 0, 1, 2, 3],
       [2, 0, 0, 1, 1],
       [2, 0, 2, 3, 3],
       [2, 1, 1, 2, 1]])
In [12]: numpy.sum(x[:, None, :] == y[None, :, :], axis=2)
Out[12]: 
array([[3, 1, 3, 1],
       [1, 0, 1, 1],
       [1, 3, 1, 3]])
In [13]: 5 - 5*cdist(x, y, 'hamming') # 5 for the row length of x and y
Out[13]: 
array([[ 3.,  1.,  3.,  1.],
       [ 1.,  0.,  1.,  1.],
       [ 1.,  3.,  1.,  3.]])

There's no option to compute non-normalized hamming distances in scipy.spatial.distance, so we have to undo the normalization.

Upvotes: 2

Related Questions