Reputation: 123
Groupby one column and get all the rows for those groups except top 5.
This is done to get top 5 rows for every group
x = pd.DataFrame({'A': ['p', 'p','p','p','p','p','p','p', 'q', 'q', 'q', 'q', 'q', 'q'], 'B' : [3,5,4,7,10,2,3,7,1,4,9,9,2,5]})
x = x.sort_values(by = 'Total', ascending = False).groupby(['Identifier']).head(5)
I want all rows for that group except top 5
Desired Result:
A B
p 3
p 3
p 2
q 1
Upvotes: 0
Views: 897
Reputation: 224
use pd.iloc[5:]
df = pd.DataFrame({'A': ['p', 'p','p','p','p','p','p','p', 'q', 'q', 'q', 'q', 'q'], 'B' : [3,5,4,7,10,2,3,7,1,4,9,9,2]})
df=df.sort_values(by = 'B', ascending = False)
Output:
A B
4 p 10
10 q 9
11 q 9
3 p 7
7 p 7
1 p 5
2 p 4
9 q 4
0 p 3
6 p 3
5 p 2
12 q 2
8 q 1
df.iloc[5:]
Output :
A B
1 p 5
2 p 4
9 q 4
0 p 3
6 p 3
5 p 2
12 q 2
8 q 1
Upvotes: 1
Reputation: 7625
Possible solution is following:
In [1]: x = pd.DataFrame(
{
'A': ['p', 'p','p','p','p','p','p','p', 'q', 'q', 'q', 'q', 'q', 'q'],
'B': [3, 5, 4, 7, 10, 2, 3, 7, 1, 4, 9, 9, 2, 5]
}
)
x = x.sort_values(by=['A', 'B'], ascending=False)
x['shifted'] = x.groupby(['A'])['B'].shift(5)
x = x.dropna()[['A', 'B']]
x
Out [2]: A B
8 q 1
0 p 3
6 p 3
5 p 2
Upvotes: 0
Reputation: 150735
You can try this:
(x.sort_values('B', ascending=False)
.groupby('A', as_index=False,
group_keys=False)
.apply(lambda x: x.iloc[5:])
)
Output:
A B
0 p 3
6 p 3
5 p 2
8 q 1
Upvotes: 0