Reputation: 417
I am trying to create a new DataFrame containing the top 10% of count records per group.
An example of initial df is as follows;
date name count
2014-12-14 Jerry 1
2014-12-21 Jerry 2
2015-01-11 Jerry 3
2015-02-01 Jerry 4
2015-02-08 Jerry 5
2015-03-01 Jerry 6
2015-03-08 Jerry 7
2015-03-15 Jerry 8
2015-03-22 Jerry 9
2015-04-26 Jerry 10
2014-12-14 Tom 1
2014-12-21 Tom 2
2015-01-11 Tom 3
2015-02-01 Tom 4
2015-02-08 Tom 5
2015-03-01 Tom 6
2015-03-08 Tom 7
2015-03-15 Tom 8
2015-03-22 Tom 9
2015-04-26 Tom 10
The above DataFrame is simply a snippet of the full DataFrame which contains numerous names, and contains weekly count
information per name over a year period.
The required output I would like is as follows.
date name count
2015-04-26 Jerry 10
2015-04-26 Tom 10
I would appreciate any assistance.
Upvotes: 1
Views: 767
Reputation: 863531
First sort_values
and then groupby
with custom lambda function for get 10%
by rows per groups:
df1 = (df.sort_values(['name','count'], ascending=[True, False])
.groupby('name', group_keys=False)
.apply(lambda x: x.head(int(len(x) / 10))))
print (df1)
date name count
9 2015-04-26 Jerry 10
19 2015-04-26 Tom 10
Upvotes: 2