Unnikrishnan
Unnikrishnan

Reputation: 3313

How I can group timestamp column to hourly and aggregate the rows in pandas dataframe

I have a dataframe with data for each minutes, it also contains a date column which is used to keep track of the date in timestamp format.

Here I'm trying to aggregate the data by hours instead of minute.

I tried the following code which is working but it needs to index based on date column which I don't want because then I cannot loop through the dataframe using df.loc function.

import pandas as pd
from datetime import datetime
import numpy as np

date_rng = pd.date_range(start='1/1/2018', end='1/08/2018', freq='T')
df = pd.DataFrame(date_rng, columns=['date'])
df['data'] = np.random.randint(0,100,size=(len(date_rng)))

df.set_index('date')
df.index = pd.to_datetime(df.index, unit='s')
df = df.resample('H').sum()
df.head(15)

I also tried groupby but it's not working, following is the code.

df.groupby([df.date.dt.hour]).data.sum()
print(df.head(15))

How I can groupby date without indexing it?

Thanks.

Upvotes: 1

Views: 1651

Answers (1)

Alexandre B.
Alexandre B.

Reputation: 5502

Try pd.Grouper and specify the freq parameter:

df.groupby([pd.Grouper(key='date', freq='1H')]).sum()

Full code:

import pandas as pd
from datetime import datetime
import numpy as np

date_rng = pd.date_range(start='1/1/2018', end='1/08/2018', freq='T')
df = pd.DataFrame(date_rng, columns=['date'])
df['data'] = np.random.randint(0, 100, size=(len(date_rng)))

print(df.groupby([pd.Grouper(key='date', freq='1H')]).sum())
#                      data
# date
# 2018-01-01 00:00:00  2958
# 2018-01-01 01:00:00  3084
# 2018-01-01 02:00:00  2991
# 2018-01-01 03:00:00  3021
# 2018-01-01 04:00:00  2894
# ...                   ...
# 2018-01-07 20:00:00  2863
# 2018-01-07 21:00:00  2850
# 2018-01-07 22:00:00  2823
# 2018-01-07 23:00:00  2805
# 2018-01-08 00:00:00    25

# [169 rows x 1 columns]

Hope that helps !

Upvotes: 1

Related Questions