How to get x% of a dataset in pandas

Question

I have a dataset of 3000k rows almost. These are labels of the dataset.

Now I want to get 10% of each label for early analysis and algorithm. This is a rough estimation.

Of course, I want shuffled rows in it, meaning that I do not want to do df[df['Label']==BENIGN].iloc[0:235909,:] because this will get the first 235k rows, but I want shuffled rows from it. How to do it?

Quang Hoang · Accepted Answer

Try sample

df.groupby('Label').sample(frac=0.1)

Edit: To sample a different fraction for a class:

df.groupby('Label').apply(lambda x: x.sample(frac=0.01 if x.Label.iloc[0]=='Benign' else 0.1)

How to get x% of a dataset in pandas

Answers (1)

Related Questions