John Stud
John Stud

Reputation: 1779

Pandas: Groupby an ID variable and filter out datetime rows

How can I groupby an ID variable while just retaining the last row entered at the daily level?

# toy data
df = pd.DataFrame({'id': [1, 1, 1],
                   'time': ['2011/10/10 10:00',
                            '2011/10/10 10:05',
                            '2011/10/11 14:00'],
                   'vals1': [3, 3, 1],
                   'vals2': [2, 3, 1]})

# set datetime
df['time'] = pd.to_datetime(df['time'])

# try to aggregate by day; getting the last value
df.groupby('id')['time'].dt.floor('d').tail(1)  # fails

# expected result
df = pd.DataFrame({'id': [1, 1],
                   'time': [
                            '2011/10/10 10:05',
                            '2011/10/11 14:00'],
                   'vals1': [3, 1],
                   'vals2': [3, 1]})

Upvotes: 1

Views: 106

Answers (1)

U13-Forward
U13-Forward

Reputation: 71600

You have to group at once, you could use pd.Grouper:

>>> df.groupby(['id', pd.Grouper(key='time', freq='d')]).tail(1)
   id                time  vals1  vals2
1   1 2011-10-10 10:05:00      3      3
2   1 2011-10-11 14:00:00      1      1
>>> 

Or group at once with dt.floor:

>>> df.groupby(['id', df['time'].dt.floor('d')]).tail(1)
   id                time  vals1  vals2
1   1 2011-10-10 10:05:00      3      3
2   1 2011-10-11 14:00:00      1      1
>>> 

Upvotes: 3

Related Questions