Reputation: 894
I have a pandas DataFrame with columns of numbers indexed by datetime, like so:
ABC XYZ
2016-12-01 06:43:02.541287-06:00 2 11
2016-12-01 06:43:06.341349-06:00 52 3
2016-12-01 06:43:11.745070-06:00 16 0
2016-12-01 06:43:18.245247-06:00 9 100
2016-12-01 06:43:25.545450-06:00 0 75
2016-12-01 06:43:35.953279-06:00 4 0
2016-12-01 06:43:45.553310-06:00 13 8
2016-12-01 06:43:55.053600-06:00 3 12
2016-12-01 06:43:59.153890-06:00 6 9
I would like to "groupby" every 10 seconds and compute the sum of ABC and XYZ, and generate a new dataframe with those sums that looks like this:
ABC XYZ
2016-12-01 06:43:10.000000-06:00 54 14
2016-12-01 06:43:20.000000-06:00 25 100
2016-12-01 06:43:30.000000-06:00 0 75
2016-12-01 06:43:40.000000-06:00 4 0
2016-12-01 06:43:50.000000-06:00 13 8
2016-12-01 06:44:00.000000-06:00 9 21
i.e. the first two rows are within the same 10 second period, so for ABC I add 2+52 to get 54 and for XYZ I add 11+3 to get 14. Then I put these sums in the new DataFrame. My code is so horribly kludgey that I don't even want to include it. Any suggestions would be much appreciated.
Upvotes: 1
Views: 457
Reputation: 863651
Use resample
with sum
:
df = df.resample('10S').sum()
print (df)
ABC XYZ
2016-12-01 12:43:00 54 14
2016-12-01 12:43:10 25 100
2016-12-01 12:43:20 0 75
2016-12-01 12:43:30 4 0
2016-12-01 12:43:40 13 8
2016-12-01 12:43:50 9 21
But if need work with timezones, first tz_localize
to default UTC
and then convert to -6
timezone (like US/Indiana-Starke
) by tz_convert
:
df = df.tz_localize('UTC').resample('10S').sum().tz_convert('US/Indiana-Starke')
print (df)
ABC XYZ
2016-12-01 06:43:00-06:00 54 14
2016-12-01 06:43:10-06:00 25 100
2016-12-01 06:43:20-06:00 0 75
2016-12-01 06:43:30-06:00 4 0
2016-12-01 06:43:40-06:00 13 8
2016-12-01 06:43:50-06:00 9 21
Upvotes: 1