Reputation: 1779
How can I groupby an ID variable while just retaining the last row entered at the daily level?
# toy data
df = pd.DataFrame({'id': [1, 1, 1],
'time': ['2011/10/10 10:00',
'2011/10/10 10:05',
'2011/10/11 14:00'],
'vals1': [3, 3, 1],
'vals2': [2, 3, 1]})
# set datetime
df['time'] = pd.to_datetime(df['time'])
# try to aggregate by day; getting the last value
df.groupby('id')['time'].dt.floor('d').tail(1) # fails
# expected result
df = pd.DataFrame({'id': [1, 1],
'time': [
'2011/10/10 10:05',
'2011/10/11 14:00'],
'vals1': [3, 1],
'vals2': [3, 1]})
Upvotes: 1
Views: 106
Reputation: 71600
You have to group at once, you could use pd.Grouper
:
>>> df.groupby(['id', pd.Grouper(key='time', freq='d')]).tail(1)
id time vals1 vals2
1 1 2011-10-10 10:05:00 3 3
2 1 2011-10-11 14:00:00 1 1
>>>
Or group at once with dt.floor
:
>>> df.groupby(['id', df['time'].dt.floor('d')]).tail(1)
id time vals1 vals2
1 1 2011-10-10 10:05:00 3 3
2 1 2011-10-11 14:00:00 1 1
>>>
Upvotes: 3