user12729159
user12729159

Reputation: 141

Converting minutes data to hourly data

I have a dataframe with thousands of unique id,s and values recorded at 5 min time interval for single day. The data is large because it contains per day data for thousands of unique ids. The data sample is shown below:

sample_data.csv

datetime,           Unique_id,   Value
2018-01-28 00:00:00, 105714,      409
2018-01-28 00:05:00, 105714,      409
2018-01-28 00:10:00, 105714,      242
2018-01-28 00:15:00, 105714,      225
....................  .......     ...
2018-01-28 23:55:00, 105714,      225
2018-01-28 00:00:00, 206714,      325
2018-01-28 00:05:00, 206714,      325
2018-01-28 00:10:00, 206714       238
....................  .......     ...
2018-01-28 23:55:00, 206714,      410

I tried to convert minutes to hours by:

pd.to_datetime(df.datetime, unit='m').dt.strftime('%H:%M')

But I am not sure how to achieve parallelly the aggregate values of last column (VALUE) in terms of hours.

Expected output

datetime,           Unique_id,   Value
2018-01-28 01(hours):00(minutes), 105714,      Sum of 5 minute values in one hour
2018-01-28 02(hours):00(minutes), 105714,      Sum of 5 minute values in one hour

Upvotes: 2

Views: 4442

Answers (2)

jezrael
jezrael

Reputation: 863651

Use Grouper for aggregate by hours with column Unique_id by sum:

df['datetime'] = pd.to_datetime(df['datetime'])
df = df.groupby([pd.Grouper(freq='H', key='datetime'), 'Unique_id']).sum().reset_index()
print (df)
             datetime  Unique_id  Value
0 2018-01-28 00:00:00     105714   1285
1 2018-01-28 00:00:00     206714    888
2 2018-01-28 23:00:00     105714    225
3 2018-01-28 23:00:00     206714    410

Upvotes: 4

Cimbali
Cimbali

Reputation: 11415

Simple groupby + sum, using the time rounded (actually floored) to the hour:

>>> df.groupby([
...     df['datetime'].transform(pd.to_datetime).dt.floor(freq='1h'),
...     'Unique_id',
... ]).sum().reset_index()
             datetime  Unique_id   Value
0 2018-01-28 00:00:00     105714  1285.0
1 2018-01-28 00:00:00     206714   888.0
2 2018-01-28 23:00:00     105714   225.0
3 2018-01-28 23:00:00     206714   410.0

Note that .transform(pd.to_datetime) is optional if your column is already a datetime dtype.

Upvotes: 1

Related Questions