Reputation: 912
I have a panel dataframe (ID and time) and want to collect the recent (latest) rows for each ID. Here is the table:
df = pd.DataFrame({'ID': [1,1,2,3] , 'Year': [2018,2019,2019,2020] , 'Var1':list("abcd") , 'Var2': list("efgh")})
and the end result would be:
Upvotes: 0
Views: 700
Reputation: 153460
Use drop_duplicates:
df.sort_values('Year').drop_duplicates('ID', keep='last')
Output:
ID Year Var1 Var2
1 1 2019 b f
2 2 2019 c g
3 3 2020 d h
Upvotes: 1
Reputation: 12503
Use tail
:
df.groupby("ID").tail(1)
The output is:
ID Year Var1 Var2
1 1 2019 b f
2 2 2019 c g
3 3 2020 d h
Another alternative is to use last
:
df.groupby("ID").last()
Upvotes: 1