user1718097
user1718097

Reputation: 4292

Create mask to identify final two rows in groups in Pandas dataframe

I have a Pandas dataframe that includes a grouping variable. An example can be produced using:

df = pd.DataFrame({'grp':['a','a','b','b','b','c','d','d','d','d'],
                   'data':[4,5,3,6,7,8,9,8,7,3]})

...which looks like:

  grp  data
0   a     4
1   a     5
2   b     3
3   b     6
4   b     7
5   c     8
6   d     9
7   d     8
8   d     7
9   d     3

I can retrieve the last two rows of each group using:

dfgrp = df.groupby('grp').tail(2)

However, I would like to produce a mask that identifies the last two rows (or 1 row if only 1 exists), ideally producing an output that looks like:

0     True
1     True
2    False
3     True
4     True
5     True
6    False
7    False
8     True
9     True

I thought this would be relatively straight-forward but I haven't been able to find the solution. Suggestions would be greatly appreciated.

Upvotes: 0

Views: 340

Answers (1)

Nick Becker
Nick Becker

Reputation: 4224

If your index is unique, you could do this by using isin.

import pandas as pd

df = pd.DataFrame({'grp':['a','a','b','b','b','c','d','d','d','d'],
                   'data':[4,5,3,6,7,8,9,8,7,3]})
df['mask'] = df.index.isin(df.groupby('grp').tail(2).index)
df

grp data    mask
0   a   4   True
1   a   5   True
2   b   3   False
3   b   6   True
4   b   7   True
5   c   8   True
6   d   9   False
7   d   8   False
8   d   7   True
9   d   3   True

Upvotes: 5

Related Questions