Reputation: 2680
I have a dataframe where I have various values in one column. I want to make sure that I have the 3 most recent entries for each unique column element in the dataframe.
I have:
Group Date Value
A 10/1/2017 4.4
A 8/3/2017 9.2
A 5/3/2017 8.4
A 4/4/2017 4.4
B 12/1/2015 4.4
B 8/3/2015 9.2
B 5/4/2015 8.4
B 4/5/2015 4.4
and I want:
Group Date Value
A 10/1/2017 4.4
A 8/3/2017 9.2
A 5/3/2017 8.4
B 12/1/2015 4.4
B 8/3/2015 9.2
B 5/4/2015 8.4
The dates are datetimes. I didn't know how to represent otherwise in this post.
I can get this with the following code:
new_df = pd.DataFrame()
for group in df['Group'].unique():
temp_df = df[df['Group'] == group]
temp_df = temp_df[0:3]
if new_df.empty:
new_df = temp_df
else:
new_df.append(temp_df)
Is there a more pythonic way to do this?
Thanks in advance.
Upvotes: 2
Views: 94
Reputation: 14113
You could use groupby
with nlargest
:
df.groupby(['Group'])['Date','Value'].apply(lambda x: x.nlargest(3,'Date'))
Date Value
Group
A 0 2017-10-01 4.4
1 2017-08-03 9.2
2 2017-05-03 8.4
B 4 2015-12-01 4.4
5 2015-08-03 9.2
6 2015-05-04 8.4
Upvotes: 2
Reputation: 6428
What about the following:
df.sort_values(by='Date', ascending=False).groupby('Group').head(3)
On my little testing dataset this returns the following (I use head(2)):
Group Date
1 c 2050-01-01
8 a 2032-02-03
0 a 2030-01-01
9 c 2029-01-01
10 b 2018-01-01
2 b 2017-02-03
And as you can see the values for the various groups are not nicely packed together anymore. We can fix this by sorting first on 'Group' and then on 'Date':
df.sort_values(by=['Group', 'Date'], ascending=[True,False]).groupby('Group').head(3)
Group Date
8 a 2032-02-03
0 a 2030-01-01
10 b 2018-01-01
2 b 2017-02-03
1 c 2050-01-01
9 c 2029-01-01
Upvotes: 2