Reputation: 2704
I have a dataset of 3000k rows almost. These are labels of the dataset.
Now I want to get 10% of each label for early analysis and algorithm. This is a rough estimation.
Of course, I want shuffled rows in it, meaning that I do not want to do df[df['Label']==BENIGN].iloc[0:235909,:]
because this will get the first 235k rows, but I want shuffled rows from it. How to do it?
Upvotes: 0
Views: 271
Reputation: 150755
Try sample
df.groupby('Label').sample(frac=0.1)
Edit: To sample a different fraction for a class:
df.groupby('Label').apply(lambda x: x.sample(frac=0.01 if x.Label.iloc[0]=='Benign' else 0.1)
Upvotes: 1