Roo
Roo

Reputation: 912

pandas groupby latest observation for each group

I have a panel dataframe (ID and time) and want to collect the recent (latest) rows for each ID. Here is the table:

df = pd.DataFrame({'ID': [1,1,2,3] , 'Year': [2018,2019,2019,2020] , 'Var1':list("abcd") , 'Var2': list("efgh")})

enter image description here

and the end result would be:

enter image description here

Upvotes: 0

Views: 700

Answers (2)

Scott Boston
Scott Boston

Reputation: 153460

Use drop_duplicates:

df.sort_values('Year').drop_duplicates('ID', keep='last')

Output:

   ID  Year Var1 Var2
1   1  2019    b    f
2   2  2019    c    g
3   3  2020    d    h

Upvotes: 1

Roy2012
Roy2012

Reputation: 12503

Use tail:

df.groupby("ID").tail(1)

The output is:

   ID  Year Var1 Var2
1   1  2019    b    f
2   2  2019    c    g
3   3  2020    d    h

Another alternative is to use last:

df.groupby("ID").last()

Upvotes: 1

Related Questions