KayEss
KayEss

Reputation: 419

Randomly drop n-groups from pandas dataframe

I have a dataframe with 15466 rows × 125 columns. Column "Subject ID" (15466 rows) contains 400 unique ID's, where each of ID's appears approximately 40 times. I want to drop random 10 Subjects from my dataframe (cca 400 rows). So far i tried this:

trial = df.groupby(['Subject_ID']).apply(lambda x: x.sample(10))

but i realized this function takes random 10 rows from each Subject_ID instead of 10 gropups/Subject_ID's

Upvotes: 1

Views: 141

Answers (1)

Chris Adams
Chris Adams

Reputation: 18647

You could use Series.unique with numpy.random.choice to randomly select 10 ID's, then boolean index using isin to filter them out from your DataFrame:

import numpy as np

exclude_ids = np.random.choice(df['Subject_ID'].unique(), 10)

df_new = df[~df['Subject_ID'].isin(exclude_ids)]

Upvotes: 2

Related Questions