Pandas: How to group by a datetime column, using only the time and discarding the date

Question

I have a dataframe with a datetime column. I want to group by the time component only and aggregate, e.g. by taking the mean.

I know that I can use pd.Grouper to group by date AND time, but it doesn't work on time only.

Say we have the following dataframe:

import numpy as np
import pandas as pd

drange = pd.date_range('2019-08-01 00:00', '2019-08-12 12:00', freq='1T')
time = drange.time
c0 = np.random.rand(len(drange))
c1 = np.random.rand(len(drange))
df = pd.DataFrame(dict(drange=drange, time=time, c0=c0, c1=c1))
print(df.head())

               drange      time        c0        c1
0 2019-08-01 00:00:00  00:00:00  0.031946  0.159739
1 2019-08-01 00:01:00  00:01:00  0.809171  0.681942
2 2019-08-01 00:02:00  00:02:00  0.036720  0.133443
3 2019-08-01 00:03:00  00:03:00  0.650522  0.409797
4 2019-08-01 00:04:00  00:04:00  0.239262  0.814565

In this case, the following throws a TypeError:

grouper = pd.Grouper(key='time', freq='5T')
grouped = df.groupby(grouper).mean()

I could set key=drange to group by date and time and then:

Reset the index
Transform the new column to float
Bin with pd.cut
Cast back to time
Finally group-by and then aggregate

... But I wonder whether there is a cleaner way to achieve the same results.

ALollz · Accepted Answer

Series.dt.time/DatetimeIndex.time returns the time as datetime.time. This isn't great because pandas works best withtimedelta64 and so your 'time' column is cast to object, losing all datetime functionality.

You can subtract off the normalized date to obtain the time as a timedelta so you can continue to use the datetime tools of pandas. You can floor this to group.

s = (df.drange - df.drange.dt.normalize()).dt.floor('5T')

df.groupby(s).mean()

                c0        c1
drange                      
00:00:00  0.436971  0.530201
00:05:00  0.441387  0.518831
00:10:00  0.465008  0.478130
...            ...       ...
23:45:00  0.523233  0.515991
23:50:00  0.468695  0.434240
23:55:00  0.569989  0.510291

Alternatively if you feel unsure of floor, this gets the identical output up to the index name

df['time'] = (df.drange - df.drange.dt.normalize())  # timedelta64[ns]
df.groupby(pd.Grouper(key='time', freq='5T')).mean()

Pandas: How to group by a datetime column, using only the time and discarding the date

Answers (2)

Related Questions