Reputation: 85
I have a dataframe, in pandas
, that I want to group by email, get the max value for date and keep the status column. But the status is not used on the groupby
.
Example: give the follow dataframe df
+-------------------------------+
| email | status | date |
+-------------------------------+
| test1 | viewed | 01/07/18 |
---------------------------------
| test1 |not viewed| 03/07/18 |
---------------------------------
| test2 |not viewed| 02/07/18 |
---------------------------------
| test2 | viewed | 01/07/18 |
---------------------------------
| test3 |not viewed| 03/07/18 |
---------------------------------
| test3 | viewed | 04/07/18 |
---------------------------------
I use the following code, but I want to keep the status column, but I don't know how.
df.groupby([email]).aggregate({'date': max})
Desired output:
+-------------------------------+
| email | status | date |
+-------------------------------+
| test1 |not viewed| 03/07/18 |
---------------------------------
| test2 |not viewed| 02/07/18 |
---------------------------------
| test3 | viewed | 04/07/18 |
---------------------------------
In sum, I want to group by email, get the most recent date and keep the status column
Upvotes: 3
Views: 146
Reputation: 323326
You can using drop_duplicates
#df['date'] = pd.to_datetime(df.date)
df.sort_values('date').drop_duplicates(['email'],keep='last')
Upvotes: 2
Reputation: 51395
Instead of agg
You could sort by date, use a groupby
, and select the last one (which will be the most recent):
df['date'] = pd.to_datetime(df.date)
df.sort_values('date').groupby('email', as_index=False).last()
email status date
0 test1 not viewed 2018-03-07
1 test2 not viewed 2018-02-07
2 test3 viewed 2018-04-07
Upvotes: 3
Reputation: 59579
You can find the index of the maximum date for each group and then subset the original DataFrame
.
import pandas as pd
df['date'] = pd.to_datetime(df.date)
df.loc[df.groupby('email').date.idxmax()]
Output:
email status date
1 test1 not viewed 2018-03-07
2 test2 not viewed 2018-02-07
5 test3 viewed 2018-04-07
Upvotes: 1