Reputation: 648
Suppose I have a 2D np.array X
, and I need to use X[:, None, :]
in some intermediate computations; for instance
np.sum(X[:, None, :] == Y[None, :, :], axis=2)
where Y
is also a 2D np.array.
Does this operation explicitly copies the memory for X
and Y
to create X[:, None, :]
and Y[:, None, :]
? If so, is there a way to avoid this copying by using views in NumPy?
Upvotes: 1
Views: 91
Reputation: 281683
X[:, None, :]
and Y[None, :, :]
are already views. Both operations are NumPy basic slicing, which always generates a view.
X[:, None, :] == Y[None, :, :]
is going to be a much bigger memory problem, as it creates a very large boolean array. You can avoid this by rewriting your computation in terms of scipy.spatial.distance.cdist
in 'hamming'
mode:
In [10]: x
Out[10]:
array([[3, 0, 2, 2, 3],
[3, 2, 1, 3, 2],
[2, 2, 1, 1, 1]])
In [11]: y
Out[11]:
array([[0, 0, 1, 2, 3],
[2, 0, 0, 1, 1],
[2, 0, 2, 3, 3],
[2, 1, 1, 2, 1]])
In [12]: numpy.sum(x[:, None, :] == y[None, :, :], axis=2)
Out[12]:
array([[3, 1, 3, 1],
[1, 0, 1, 1],
[1, 3, 1, 3]])
In [13]: 5 - 5*cdist(x, y, 'hamming') # 5 for the row length of x and y
Out[13]:
array([[ 3., 1., 3., 1.],
[ 1., 0., 1., 1.],
[ 1., 3., 1., 3.]])
There's no option to compute non-normalized hamming distances in scipy.spatial.distance
, so we have to undo the normalization.
Upvotes: 2