Reputation: 911
I want to be able to extract values from a pandas dataframe using a mask. However, after searching around, I cannot find a solution to my problem.
df = pd.DataFrame(np.random.randint(0,2, size=(2,10)))
mask = np.random.randint(0,2, size=(1,10))
I basically want the mask to serve as a index lookup for each column.
So if the mask was [0,1] for columns [a,b], I want to return:
df.iloc[0,a], df.iloc[1,b]
but in a pythonic way.
I have tried e.g.:
df.apply(lambda x: df.iloc[mask[x], x] for x in range(len(mask)))
which gives a Type error that I don't understand.
A for loop can work but is slow.
Upvotes: 1
Views: 692
Reputation: 221504
With NumPy, that's covered as advanced-indexing
and should be pretty efficient -
df.values[mask, np.arange(mask.size)]
Sample run -
In [59]: df = pd.DataFrame(np.random.randint(11,99, size=(5,10)))
In [60]: mask = np.random.randint(0,5, size=(1,10))
In [61]: df
Out[61]:
0 1 2 3 4 5 6 7 8 9
0 17 87 73 98 32 37 61 58 35 87
1 52 64 17 79 20 19 89 88 19 24
2 50 33 41 75 19 77 15 59 84 86
3 69 13 88 78 46 76 33 79 27 22
4 80 64 17 95 49 16 87 82 60 19
In [62]: mask
Out[62]: array([[2, 3, 0, 4, 2, 2, 4, 0, 0, 0]])
In [63]: df.values[mask, np.arange(mask.size)]
Out[63]: array([[50, 13, 73, 95, 19, 77, 87, 58, 35, 87]])
Upvotes: 2