Reputation: 125
I have a dataframe with a datetime column. I want to group by the time component only and aggregate, e.g. by taking the mean.
I know that I can use pd.Grouper to group by date AND time, but it doesn't work on time only.
Say we have the following dataframe:
import numpy as np
import pandas as pd
drange = pd.date_range('2019-08-01 00:00', '2019-08-12 12:00', freq='1T')
time = drange.time
c0 = np.random.rand(len(drange))
c1 = np.random.rand(len(drange))
df = pd.DataFrame(dict(drange=drange, time=time, c0=c0, c1=c1))
print(df.head())
drange time c0 c1
0 2019-08-01 00:00:00 00:00:00 0.031946 0.159739
1 2019-08-01 00:01:00 00:01:00 0.809171 0.681942
2 2019-08-01 00:02:00 00:02:00 0.036720 0.133443
3 2019-08-01 00:03:00 00:03:00 0.650522 0.409797
4 2019-08-01 00:04:00 00:04:00 0.239262 0.814565
In this case, the following throws a TypeError:
grouper = pd.Grouper(key='time', freq='5T')
grouped = df.groupby(grouper).mean()
I could set key=drange
to group by date and time and then:
... But I wonder whether there is a cleaner way to achieve the same results.
Upvotes: 1
Views: 1475
Reputation: 59549
Series.dt.time
/DatetimeIndex.time
returns the time as datetime.time
. This isn't great because pandas works best withtimedelta64
and so your 'time'
column is cast to object
, losing all datetime functionality.
You can subtract off the normalized date to obtain the time as a timedelta
so you can continue to use the datetime
tools of pandas. You can floor
this to group.
s = (df.drange - df.drange.dt.normalize()).dt.floor('5T')
df.groupby(s).mean()
c0 c1
drange
00:00:00 0.436971 0.530201
00:05:00 0.441387 0.518831
00:10:00 0.465008 0.478130
... ... ...
23:45:00 0.523233 0.515991
23:50:00 0.468695 0.434240
23:55:00 0.569989 0.510291
Alternatively if you feel unsure of floor
, this gets the identical output up to the index name
df['time'] = (df.drange - df.drange.dt.normalize()) # timedelta64[ns]
df.groupby(pd.Grouper(key='time', freq='5T')).mean()
Upvotes: 3
Reputation: 353
When you use DataFrame.groupby
you can a Series an argument. Moreover, if your series is a datetime, you can use the series.dt
to access the properties of date. In your case df['drange'].dt.hour
or df['drange'].dt.time
should do it.
# df['drange']=pd.to_datetime(df['drange'])
df.groupby(df['drange'].dt.hour).agg(...)
Upvotes: 0