Adrian
Adrian

Reputation: 373

Equally distribute a pandas Dataframe based on column

I have a Pandas DataFrame with unevenly distributed labels:

a      label
0        0
1        0
2        0
3        0
4        0
        ..
65693    7
65694    7
65695    7
65696    7
65697    7

"Rows" per label:

1: 7673
2: 28930
3: 615
4: 7619
5: 3888
6: 2853
7: 5312
0: 8808

Now I need another DataFrame where each class is represented exactly 615 (min of the labels) times so the DF will contain 8 x 615 rows.

Thank you for your help.

Upvotes: 2

Views: 1233

Answers (1)

jezrael
jezrael

Reputation: 863166

Use GroupBy.head with Series.value_counts for minimal counts:

n = df['labels'].value_counts().min()
df.groupby('labels').head(n)

Upvotes: 4

Related Questions