acb
acb

Reputation: 625

Pandas group hourly data into daily sums with date index

I am working on a code that takes hourly data for a month and groups it into 24 hour sums. My problem is that I would like the index to read the date/year and I am just getting an index of 1-30.

The code I am using is

df = df.iloc[:,16:27].groupby([lambda x: x.day]).sum()

example of output I am getting

DateTime     data
1            1772.031568
2            19884.42243
3            28696.72159
4            24906.20355
5            9059.120325

example of output I would like

DateTime     data
1/1/2017     1772.031568
1/2/2017     19884.42243
1/3/2017     28696.72159
1/4/2017     24906.20355
1/5/2017     9059.120325

Upvotes: 8

Views: 8295

Answers (5)

Marius
Marius

Reputation: 413

This is an old question, but I don't think the accepted solution is the best in this particular case. What you want to accomplish is to down sample time series data, and Pandas has built-in functionality for this called resample(). For your example you will do:

df = df.iloc[:,16:27].resample('D').sum()

or if the datetime column is not the index

df = df.iloc[:,16:27].resample('D', on='datetime_column_name').sum()

There are (at least) 2 benefits from doing it this way as opposed to accepted answer:

  1. Resample can up sample and down sample, groupby() can only down sample
  2. No lambdas, list comprehensions or date formatting functions required.

For more information and examples, see documentation here: resample()

Upvotes: 9

The Aelfinn
The Aelfinn

Reputation: 17118

First you need to create an index on your datetime column to expose functions that break the datetime into smaller pieces efficiently (like the year and month of the datetime).

Next, you need to group by the year, month and day of the index if you want to apply an aggregate method (like sum()) to each day of the year, and retain separate aggregations for each day.

The reset_index() and rename() functions allow us to rename our group_by categories to their original names. This "flattens" out our data, making the category an actual column on the resulting dataframe.

import pandas as pd

date_index = pd.DatetimeIndex(df.created_at)
# 'df.created_at' is the datetime column in your dataframe
counted = df.group_by([date_index.year, date_index.month, date_index.day])\
            .agg({'column_to_sum': 'sum'})\
            .reset_index()\
            .rename(columns={'level_1': 'year',
                             'level_2': 'month',
                             'level_3': 'day'})

# Resulting dataframe has columns "column_to_sum", "year", "month", "day" available

Upvotes: 1

plasmon360
plasmon360

Reputation: 4199

if your index was not datetime object.

import pandas as pd
df = pd.DataFrame({'data': [1772.031568, 19884.42243,28696.72159, 24906.20355,9059.120325]},index=[1,2,3,4,5])
print df.head()
rng = pd.date_range('1/1/2017',periods =len(df.index), freq='D')
df.set_index(rng,inplace=True)
print df.head()

will result in

           data
1   1772.031568
2  19884.422430
3  28696.721590
4  24906.203550
5   9059.120325
                    data
2017-01-01   1772.031568
2017-01-02  19884.422430
2017-01-03  28696.721590
2017-01-04  24906.203550
2017-01-05   9059.120325

Upvotes: 2

Teudimundo
Teudimundo

Reputation: 2670

You can exploit panda's DatetimeIndex:

working_df=df.iloc[:, 16:27]
result = working_df.groupby(pd.DatetimeIndex(working_df.DateTime)).date).sum()

This if you DateTime column is actually DateTime (and be careful of the timezone).

In this way you will have valid datetime in the index, so that you can easily do other manipulations.

Upvotes: 0

Nicolás Ozimica
Nicolás Ozimica

Reputation: 9758

If your index is a datetime, you can build a combined groupby clause:

df = df.iloc[:,16:27].groupby([lambda x: "{}/{}/{}".format(x.day, x.month, x.year)]).sum()

or even better:

df = df.iloc[:,16:27].groupby([lambda x: x.strftime("%d%m%Y")]).sum()

Upvotes: 4

Related Questions