Reputation: 3375
Hi I have a multi index dataframe like below and I want to randomly select part of this dataframe according to ID_1:
Below is my Dataframe
ID_1 ID_2 feature_1 feature_2
1 1 0 0
2 1 1
2 1 1 1
2 0 1
3 1 1 1
2 0 1
4 1 1 1
2 0 1
and I want to select 2 of ID_1's out of 4. Example result:
ID_1 ID_2 feature_1 feature_2
2 1 1 1
2 0 1
4 1 1 1
2 0 1
What is the best way to do this. Thank you.
Upvotes: 1
Views: 377
Reputation: 402613
Use np.random.choice
and select 2 levels at random from df.index.levels[0]
. You can then use the selected levels to index into df
using df.loc
.
df
feature_1 feature_2
ID_1 ID_2
1 1 0 0
2 1 1
2 1 1 1
2 0 1
3 1 1 1
2 0 1
4 1 1 1
2 0 1
# np.random.seed(0) # Uncomment to make results reproducible.
df.loc[np.random.choice(df.index.levels[0], 2, replace=False)]
feature_1 feature_2
ID_1 ID_2
3 1 1 1
2 0 1
4 1 1 1
2 0 1
If you need to do the same thing for the first level, use pd.IndexSlice
for slicing on the first level.
v = np.random.choice(df.index.levels[1], 2, replace=False)
df.loc[pd.IndexSlice[:, v], :]
# df.loc(axis=0)[pd.IndexSlice[:, v]]
Upvotes: 1