Reputation: 625
I am working on a code that takes hourly data for a month and groups it into 24 hour sums. My problem is that I would like the index to read the date/year and I am just getting an index of 1-30.
The code I am using is
df = df.iloc[:,16:27].groupby([lambda x: x.day]).sum()
example of output I am getting
DateTime data
1 1772.031568
2 19884.42243
3 28696.72159
4 24906.20355
5 9059.120325
example of output I would like
DateTime data
1/1/2017 1772.031568
1/2/2017 19884.42243
1/3/2017 28696.72159
1/4/2017 24906.20355
1/5/2017 9059.120325
Upvotes: 8
Views: 8295
Reputation: 413
This is an old question, but I don't think the accepted solution is the best in this particular case. What you want to accomplish is to down sample time series data, and Pandas has built-in functionality for this called resample(). For your example you will do:
df = df.iloc[:,16:27].resample('D').sum()
or if the datetime column is not the index
df = df.iloc[:,16:27].resample('D', on='datetime_column_name').sum()
There are (at least) 2 benefits from doing it this way as opposed to accepted answer:
For more information and examples, see documentation here: resample()
Upvotes: 9
Reputation: 17118
First you need to create an index on your datetime column to expose functions that break the datetime into smaller pieces efficiently (like the year and month of the datetime).
Next, you need to group by the year, month and day of the index if you want to apply an aggregate method (like sum()
) to each day of the year, and retain separate aggregations for each day.
The reset_index()
and rename()
functions allow us to rename our group_by categories to their original names. This "flattens" out our data, making the category an actual column on the resulting dataframe.
import pandas as pd
date_index = pd.DatetimeIndex(df.created_at)
# 'df.created_at' is the datetime column in your dataframe
counted = df.group_by([date_index.year, date_index.month, date_index.day])\
.agg({'column_to_sum': 'sum'})\
.reset_index()\
.rename(columns={'level_1': 'year',
'level_2': 'month',
'level_3': 'day'})
# Resulting dataframe has columns "column_to_sum", "year", "month", "day" available
Upvotes: 1
Reputation: 4199
if your index was not datetime object.
import pandas as pd
df = pd.DataFrame({'data': [1772.031568, 19884.42243,28696.72159, 24906.20355,9059.120325]},index=[1,2,3,4,5])
print df.head()
rng = pd.date_range('1/1/2017',periods =len(df.index), freq='D')
df.set_index(rng,inplace=True)
print df.head()
will result in
data
1 1772.031568
2 19884.422430
3 28696.721590
4 24906.203550
5 9059.120325
data
2017-01-01 1772.031568
2017-01-02 19884.422430
2017-01-03 28696.721590
2017-01-04 24906.203550
2017-01-05 9059.120325
Upvotes: 2
Reputation: 2670
You can exploit panda's DatetimeIndex
:
working_df=df.iloc[:, 16:27]
result = working_df.groupby(pd.DatetimeIndex(working_df.DateTime)).date).sum()
This if you DateTime column is actually DateTime (and be careful of the timezone).
In this way you will have valid datetime in the index, so that you can easily do other manipulations.
Upvotes: 0
Reputation: 9758
If your index is a datetime
, you can build a combined groupby
clause:
df = df.iloc[:,16:27].groupby([lambda x: "{}/{}/{}".format(x.day, x.month, x.year)]).sum()
or even better:
df = df.iloc[:,16:27].groupby([lambda x: x.strftime("%d%m%Y")]).sum()
Upvotes: 4