Reputation: 626
I have a dataframe df
where I want to drop last n
rows within a group of columns. For example, say df
is defined as below the group is of columns a
and b
:
>>> import pandas as pd
>>> df = pd.DataFrame({'a':['abd']*4 + ['pqr']*5 + ['xyz']*7, 'b':['john']*7 + ['doe']*9, 'c':range(16), 'd':range(1000,1016)})
>>> df
a b c d
0 abd john 0 1000
1 abd john 1 1001
2 abd john 2 1002
3 abd john 3 1003
4 pqr john 4 1004
5 pqr john 5 1005
6 pqr john 6 1006
7 pqr doe 7 1007
8 pqr doe 8 1008
9 xyz doe 9 1009
10 xyz doe 10 1010
11 xyz doe 11 1011
12 xyz doe 12 1012
13 xyz doe 13 1013
14 xyz doe 14 1014
15 xyz doe 15 1015
>>>
Desired output for n=2
is as follows:
>>> df
a b c d
0 abd john 0 1000
1 abd john 1 1001
4 pqr john 4 1004
9 xyz doe 9 1009
10 xyz doe 10 1010
11 xyz doe 11 1011
12 xyz doe 12 1012
13 xyz doe 13 1013
>>>
Desired output for n=3
is as follows:
>>> df
a b c d
0 abd john 0 1000
9 xyz doe 9 1009
10 xyz doe 10 1010
11 xyz doe 11 1011
12 xyz doe 12 1012
>>>
Upvotes: 5
Views: 1900
Reputation: 1527
You can use groupby
and drop
as below:
n = 2
df.drop(df.groupby(['a','b']).tail(n).index, axis=0)
Upvotes: 7
Reputation: 16162
You could get the index values of the tail(n)
records per group and use .loc
with ~
to exclude those.
n=3
df.loc[~df.index.isin(df.groupby(['a','b']).tail(n).index.values)]
Output
a b c d
0 abd john 0 1000
9 xyz doe 9 1009
10 xyz doe 10 1010
11 xyz doe 11 1011
12 xyz doe 12 1012
Upvotes: 3