Reputation: 392
What I have:
import numpy as np
np.random.seed(42)
dlen = 250000
data = np.random.rand(dlen, 3, 3)
mask = np.random.choice([0, 1, 2], dlen)
What I want to get:
[[0.37454012 0.95071431 0.73199394],
[0.83244264 0.21233911 0.18182497],
[0.13949386 0.29214465 0.36636184],
[0.94888554 0.96563203 0.80839735],
[0.44015249 0.12203823 0.49517691],
....
(250000, 3)
What I try to use for this:
data[:,mask,:]
{MemoryError}Unable to allocate 1.36 TiB for an array with shape (250000, 250000, 3) and data type float64
What gives the correct result but looks strange:
data[np.arange(data.shape[0]), mask, :]
So what's the correct way to use this mask?
Upd.: The mask should select the column with the specified index. Example for an array with shape [2,3,3]:
array = [[[5 6 7], [7 8 9], [2 3 4]],
[[2 1 0], [7 6 5], [7 6 5]]]
mask = [1 0]
result = [[7 8 9],
[2 1 0]]
Upvotes: 1
Views: 764
Reputation: 23743
data[np.arange(data.shape[0]), mask, :]
That works because it is a multi-dimensional index array
When I here the term mask I think of boolean indexing. Your integer mask can be converted to a boolean mask to use it the way you want.
>>> data.shape
(250000, 3, 3)
>>> mask.shape
(250000,)
>>> q = mask[:,None] == [0,1,2]
>>> q.shape
(250000, 3)
>>> q[:5]
array([[ True, False, False],
[False, True, False],
[False, True, False],
[False, False, True],
[False, True, False]])
>>> r = data[q]
>>> r.shape
(250000, 3)
>>> r[:10]
array([[0.37454012, 0.95071431, 0.73199394],
[0.83244264, 0.21233911, 0.18182497],
[0.13949386, 0.29214465, 0.36636184],
[0.94888554, 0.96563203, 0.80839735],
[0.44015249, 0.12203823, 0.49517691],
[0.66252228, 0.31171108, 0.52006802],
[0.59789998, 0.92187424, 0.0884925 ],
[0.14092422, 0.80219698, 0.07455064],
[0.00552212, 0.81546143, 0.70685734],
[0.31098232, 0.32518332, 0.72960618]])
>>>
You could use the second dimension length to make is a little more generic:
q = mask[:,None] == np.arange(data.shape[1])
>>> q[:5]
array([[ True, False, False],
[False, True, False],
[False, True, False],
[False, False, True],
[False, True, False]])
If you control construction of the mask, you might want to construct it as a boolean array.
If this is new code, you might want to upgrade to a compatible version of Numpy and use the new random generator.
Upvotes: 2