Gerry
Gerry

Reputation: 626

Drop last n rows within pandas dataframe groupby

I have a dataframe df where I want to drop last n rows within a group of columns. For example, say df is defined as below the group is of columns a and b:

>>> import pandas as pd
>>> df = pd.DataFrame({'a':['abd']*4 + ['pqr']*5 + ['xyz']*7, 'b':['john']*7 + ['doe']*9, 'c':range(16), 'd':range(1000,1016)})
>>> df
      a     b   c     d
0   abd  john   0  1000
1   abd  john   1  1001
2   abd  john   2  1002
3   abd  john   3  1003
4   pqr  john   4  1004
5   pqr  john   5  1005
6   pqr  john   6  1006
7   pqr   doe   7  1007
8   pqr   doe   8  1008
9   xyz   doe   9  1009
10  xyz   doe  10  1010
11  xyz   doe  11  1011
12  xyz   doe  12  1012
13  xyz   doe  13  1013
14  xyz   doe  14  1014
15  xyz   doe  15  1015
>>> 

Desired output for n=2 is as follows:

>>> df
      a     b   c     d
0   abd  john   0  1000
1   abd  john   1  1001
4   pqr  john   4  1004
9   xyz   doe   9  1009
10  xyz   doe  10  1010
11  xyz   doe  11  1011
12  xyz   doe  12  1012
13  xyz   doe  13  1013
>>>

Desired output for n=3 is as follows:

>>> df
      a     b   c     d
0   abd  john   0  1000
9   xyz   doe   9  1009
10  xyz   doe  10  1010
11  xyz   doe  11  1011
12  xyz   doe  12  1012
>>> 

Upvotes: 5

Views: 1900

Answers (2)

nimbous
nimbous

Reputation: 1527

You can use groupby and drop as below:

n = 2
df.drop(df.groupby(['a','b']).tail(n).index, axis=0)

Upvotes: 7

Chris
Chris

Reputation: 16162

You could get the index values of the tail(n) records per group and use .loc with ~ to exclude those.

n=3
df.loc[~df.index.isin(df.groupby(['a','b']).tail(n).index.values)]

Output

      a    b    c      d
0   abd john    0   1000
9   xyz doe     9   1009
10  xyz doe    10   1010
11  xyz doe    11   1011
12  xyz doe    12   1012

Upvotes: 3

Related Questions