Reputation: 373
I have a Pandas DataFrame with unevenly distributed labels:
a label
0 0
1 0
2 0
3 0
4 0
..
65693 7
65694 7
65695 7
65696 7
65697 7
"Rows" per label:
1: 7673
2: 28930
3: 615
4: 7619
5: 3888
6: 2853
7: 5312
0: 8808
Now I need another DataFrame where each class is represented exactly 615 (min of the labels) times so the DF will contain 8 x 615 rows.
Thank you for your help.
Upvotes: 2
Views: 1233
Reputation: 863166
Use GroupBy.head
with Series.value_counts
for minimal counts:
n = df['labels'].value_counts().min()
df.groupby('labels').head(n)
Upvotes: 4