Reputation: 141
I have a dataframe with thousands of unique id,s and values recorded at 5 min time interval for single day. The data is large because it contains per day data for thousands of unique ids. The data sample is shown below:
sample_data.csv
datetime, Unique_id, Value
2018-01-28 00:00:00, 105714, 409
2018-01-28 00:05:00, 105714, 409
2018-01-28 00:10:00, 105714, 242
2018-01-28 00:15:00, 105714, 225
.................... ....... ...
2018-01-28 23:55:00, 105714, 225
2018-01-28 00:00:00, 206714, 325
2018-01-28 00:05:00, 206714, 325
2018-01-28 00:10:00, 206714 238
.................... ....... ...
2018-01-28 23:55:00, 206714, 410
I tried to convert minutes to hours by:
pd.to_datetime(df.datetime, unit='m').dt.strftime('%H:%M')
But I am not sure how to achieve parallelly the aggregate values of last column (VALUE) in terms of hours.
Expected output
datetime, Unique_id, Value
2018-01-28 01(hours):00(minutes), 105714, Sum of 5 minute values in one hour
2018-01-28 02(hours):00(minutes), 105714, Sum of 5 minute values in one hour
Upvotes: 2
Views: 4442
Reputation: 863651
Use Grouper
for aggregate by hours with column Unique_id
by sum
:
df['datetime'] = pd.to_datetime(df['datetime'])
df = df.groupby([pd.Grouper(freq='H', key='datetime'), 'Unique_id']).sum().reset_index()
print (df)
datetime Unique_id Value
0 2018-01-28 00:00:00 105714 1285
1 2018-01-28 00:00:00 206714 888
2 2018-01-28 23:00:00 105714 225
3 2018-01-28 23:00:00 206714 410
Upvotes: 4
Reputation: 11415
Simple groupby + sum, using the time rounded (actually floored) to the hour:
>>> df.groupby([
... df['datetime'].transform(pd.to_datetime).dt.floor(freq='1h'),
... 'Unique_id',
... ]).sum().reset_index()
datetime Unique_id Value
0 2018-01-28 00:00:00 105714 1285.0
1 2018-01-28 00:00:00 206714 888.0
2 2018-01-28 23:00:00 105714 225.0
3 2018-01-28 23:00:00 206714 410.0
Note that .transform(pd.to_datetime)
is optional if your column is already a datetime
dtype.
Upvotes: 1