Converting minutes data to hourly data

Question

I have a dataframe with thousands of unique id,s and values recorded at 5 min time interval for single day. The data is large because it contains per day data for thousands of unique ids. The data sample is shown below:

`sample_data.csv`

datetime,           Unique_id,   Value
2018-01-28 00:00:00, 105714,      409
2018-01-28 00:05:00, 105714,      409
2018-01-28 00:10:00, 105714,      242
2018-01-28 00:15:00, 105714,      225
....................  .......     ...
2018-01-28 23:55:00, 105714,      225
2018-01-28 00:00:00, 206714,      325
2018-01-28 00:05:00, 206714,      325
2018-01-28 00:10:00, 206714       238
....................  .......     ...
2018-01-28 23:55:00, 206714,      410

I am trying to convert 5 minute data into hourly data while aggregating the values of last column (VALUE) in terms of hours.
The unique_id values should be same while converting into hours (They should not be sum up).

I tried to convert minutes to hours by:

pd.to_datetime(df.datetime, unit='m').dt.strftime('%H:%M')

But I am not sure how to achieve parallelly the aggregate values of last column (VALUE) in terms of hours.

`Expected output`

datetime,           Unique_id,   Value
2018-01-28 01(hours):00(minutes), 105714,      Sum of 5 minute values in one hour
2018-01-28 02(hours):00(minutes), 105714,      Sum of 5 minute values in one hour

jezrael · Accepted Answer

Use Grouper for aggregate by hours with column Unique_id by sum:

df['datetime'] = pd.to_datetime(df['datetime'])
df = df.groupby([pd.Grouper(freq='H', key='datetime'), 'Unique_id']).sum().reset_index()
print (df)
             datetime  Unique_id  Value
0 2018-01-28 00:00:00     105714   1285
1 2018-01-28 00:00:00     206714    888
2 2018-01-28 23:00:00     105714    225
3 2018-01-28 23:00:00     206714    410

Converting minutes data to hourly data

`sample_data.csv`

`Expected output`

Answers (2)

Related Questions