TayyabRahmani
TayyabRahmani

Reputation: 123

Groupby and get all rows except top 5

Groupby one column and get all the rows for those groups except top 5.

This is done to get top 5 rows for every group

x = pd.DataFrame({'A': ['p', 'p','p','p','p','p','p','p', 'q', 'q', 'q', 'q', 'q', 'q'], 'B' : [3,5,4,7,10,2,3,7,1,4,9,9,2,5]})
x = x.sort_values(by = 'Total', ascending = False).groupby(['Identifier']).head(5)

I want all rows for that group except top 5

Desired Result:

A B

p 3

p 3

p 2

q 1

Upvotes: 0

Views: 897

Answers (3)

Demont Zhang
Demont Zhang

Reputation: 224

use pd.iloc[5:]

df = pd.DataFrame({'A': ['p', 'p','p','p','p','p','p','p', 'q', 'q', 'q', 'q', 'q'], 'B' : [3,5,4,7,10,2,3,7,1,4,9,9,2]})
df=df.sort_values(by = 'B', ascending = False)

Output:

    A   B
4   p   10
10  q   9
11  q   9
3   p   7
7   p   7
1   p   5
2   p   4
9   q   4
0   p   3
6   p   3
5   p   2
12  q   2
8   q   1

df.iloc[5:]

Output :

    A   B
1   p   5
2   p   4
9   q   4
0   p   3
6   p   3
5   p   2
12  q   2
8   q   1

Upvotes: 1

Jaroslav Bezděk
Jaroslav Bezděk

Reputation: 7625

Possible solution is following:

In  [1]: x = pd.DataFrame(
            {
              'A': ['p', 'p','p','p','p','p','p','p', 'q', 'q', 'q', 'q', 'q', 'q'], 
              'B': [3, 5, 4, 7, 10, 2, 3, 7, 1, 4, 9, 9, 2, 5]
            }
         )
         x = x.sort_values(by=['A', 'B'], ascending=False)
         x['shifted'] = x.groupby(['A'])['B'].shift(5)
         x = x.dropna()[['A', 'B']]
         x

Out [2]:    A   B
         8  q   1 
         0  p   3
         6  p   3
         5  p   2

Upvotes: 0

Quang Hoang
Quang Hoang

Reputation: 150735

You can try this:

(x.sort_values('B', ascending=False)
  .groupby('A', as_index=False,
          group_keys=False)
  .apply(lambda x: x.iloc[5:])
)

Output:

   A  B
0  p  3
6  p  3
5  p  2
8  q  1

Upvotes: 0

Related Questions