Reputation: 3313
I have a dataframe with data for each minutes, it also contains a date
column which is used to keep track of the date in timestamp format.
Here I'm trying to aggregate the data by hours instead of minute.
I tried the following code which is working but it needs to index based on date
column which I don't want because then I cannot loop through the dataframe using df.loc
function.
import pandas as pd
from datetime import datetime
import numpy as np
date_rng = pd.date_range(start='1/1/2018', end='1/08/2018', freq='T')
df = pd.DataFrame(date_rng, columns=['date'])
df['data'] = np.random.randint(0,100,size=(len(date_rng)))
df.set_index('date')
df.index = pd.to_datetime(df.index, unit='s')
df = df.resample('H').sum()
df.head(15)
I also tried groupby but it's not working, following is the code.
df.groupby([df.date.dt.hour]).data.sum()
print(df.head(15))
How I can groupby date
without indexing it?
Thanks.
Upvotes: 1
Views: 1651
Reputation: 5502
Try pd.Grouper
and specify the freq
parameter:
df.groupby([pd.Grouper(key='date', freq='1H')]).sum()
Full code:
import pandas as pd
from datetime import datetime
import numpy as np
date_rng = pd.date_range(start='1/1/2018', end='1/08/2018', freq='T')
df = pd.DataFrame(date_rng, columns=['date'])
df['data'] = np.random.randint(0, 100, size=(len(date_rng)))
print(df.groupby([pd.Grouper(key='date', freq='1H')]).sum())
# data
# date
# 2018-01-01 00:00:00 2958
# 2018-01-01 01:00:00 3084
# 2018-01-01 02:00:00 2991
# 2018-01-01 03:00:00 3021
# 2018-01-01 04:00:00 2894
# ... ...
# 2018-01-07 20:00:00 2863
# 2018-01-07 21:00:00 2850
# 2018-01-07 22:00:00 2823
# 2018-01-07 23:00:00 2805
# 2018-01-08 00:00:00 25
# [169 rows x 1 columns]
Hope that helps !
Upvotes: 1