Ahmad Anis
Ahmad Anis

Reputation: 2704

How to get x% of a dataset in pandas

I have a dataset of 3000k rows almost. These are labels of the dataset.

enter image description here

Now I want to get 10% of each label for early analysis and algorithm. This is a rough estimation. enter image description here

Of course, I want shuffled rows in it, meaning that I do not want to do df[df['Label']==BENIGN].iloc[0:235909,:] because this will get the first 235k rows, but I want shuffled rows from it. How to do it?

Upvotes: 0

Views: 271

Answers (1)

Quang Hoang
Quang Hoang

Reputation: 150755

Try sample

df.groupby('Label').sample(frac=0.1)

Edit: To sample a different fraction for a class:

df.groupby('Label').apply(lambda x: x.sample(frac=0.01 if x.Label.iloc[0]=='Benign' else 0.1)

Upvotes: 1

Related Questions