ontheway
ontheway

Reputation: 125

Pandas: How to group by a datetime column, using only the time and discarding the date

I have a dataframe with a datetime column. I want to group by the time component only and aggregate, e.g. by taking the mean.

I know that I can use pd.Grouper to group by date AND time, but it doesn't work on time only.

Say we have the following dataframe:

import numpy as np
import pandas as pd

drange = pd.date_range('2019-08-01 00:00', '2019-08-12 12:00', freq='1T')
time = drange.time
c0 = np.random.rand(len(drange))
c1 = np.random.rand(len(drange))
df = pd.DataFrame(dict(drange=drange, time=time, c0=c0, c1=c1))
print(df.head())
               drange      time        c0        c1
0 2019-08-01 00:00:00  00:00:00  0.031946  0.159739
1 2019-08-01 00:01:00  00:01:00  0.809171  0.681942
2 2019-08-01 00:02:00  00:02:00  0.036720  0.133443
3 2019-08-01 00:03:00  00:03:00  0.650522  0.409797
4 2019-08-01 00:04:00  00:04:00  0.239262  0.814565

In this case, the following throws a TypeError:

grouper = pd.Grouper(key='time', freq='5T')
grouped = df.groupby(grouper).mean()

I could set key=drange to group by date and time and then:

... But I wonder whether there is a cleaner way to achieve the same results.

Upvotes: 1

Views: 1475

Answers (2)

ALollz
ALollz

Reputation: 59549

Series.dt.time/DatetimeIndex.time returns the time as datetime.time. This isn't great because pandas works best withtimedelta64 and so your 'time' column is cast to object, losing all datetime functionality.

You can subtract off the normalized date to obtain the time as a timedelta so you can continue to use the datetime tools of pandas. You can floor this to group.

s = (df.drange - df.drange.dt.normalize()).dt.floor('5T')

df.groupby(s).mean()

                c0        c1
drange                      
00:00:00  0.436971  0.530201
00:05:00  0.441387  0.518831
00:10:00  0.465008  0.478130
...            ...       ...
23:45:00  0.523233  0.515991
23:50:00  0.468695  0.434240
23:55:00  0.569989  0.510291

Alternatively if you feel unsure of floor, this gets the identical output up to the index name

df['time'] = (df.drange - df.drange.dt.normalize())  # timedelta64[ns]
df.groupby(pd.Grouper(key='time', freq='5T')).mean()

Upvotes: 3

federicober
federicober

Reputation: 353

When you use DataFrame.groupby you can a Series an argument. Moreover, if your series is a datetime, you can use the series.dt to access the properties of date. In your case df['drange'].dt.hour or df['drange'].dt.time should do it.

# df['drange']=pd.to_datetime(df['drange'])
df.groupby(df['drange'].dt.hour).agg(...)

Upvotes: 0

Related Questions