cattt84
cattt84

Reputation: 951

How do resample pandas.DataFrame (a week) to averaged Day

I have data from several days (or even weeks) taken at exactly the same time intervals each day and want to compute an average day time curve. I tried daily average so far, but than I had one average value for each day... What I need is one value averaged over all available days at each available time. Most probably really easy knowing the right commands. Unfortunately I am quite new to pandas. Even just a hint where to look in the documentation would be great!

Time                   some value

2010-08-31 12:30:00    33.910
2010-08-31 12:40:00    33.250
2010-08-31 12:50:00    30.500
2010-08-31 13:00:00    27.065
2010-08-31 13:10:00    25.610
...

2013-06-07 02:10:00    16.970
2013-06-07 02:20:00    16.955
2013-06-07 02:30:00    17.000
2013-06-07 02:40:00    17.015
2013-06-07 02:50:00    16.910

Upvotes: 1

Views: 961

Answers (1)

jezrael
jezrael

Reputation: 862731

You can try groupby by hours and minutes and transform mean:

print df
                  Time  some value
0  2010-08-31 12:30:00      33.910
1  2010-08-31 12:40:00      33.250
2  2010-08-31 12:50:00      30.500
3  2010-08-31 13:00:00      27.065
4  2010-08-31 13:10:00      25.610
5  2013-06-07 02:10:00      16.970
6  2013-06-07 02:20:00      16.955
7  2013-06-07 02:30:00      17.000
8  2013-06-07 02:40:00      17.015
9  2013-06-07 02:50:00      16.910

#convert column time to datetime
df['Time'] = pd.to_datetime(df['Time'])
#set index from column Time
df = df.set_index('Time')
print df
                     some value
Time                           
2010-08-31 12:30:00      33.910
2010-08-31 12:40:00      33.250
2010-08-31 12:50:00      30.500
2010-08-31 13:00:00      27.065
2010-08-31 13:10:00      25.610
2013-06-07 02:10:00      16.970
2013-06-07 02:20:00      16.955
2013-06-07 02:30:00      17.000
2013-06-07 02:40:00      17.015
2013-06-07 02:50:00      16.910
print df.groupby([df.index.hour, df.index.minute])['some value'].transform('mean')
Time
2010-08-31 12:30:00    33.910
2010-08-31 12:40:00    33.250
2010-08-31 12:50:00    30.500
2010-08-31 13:00:00    27.065
2010-08-31 13:10:00    25.610
2013-06-07 02:10:00    16.970
2013-06-07 02:20:00    16.955
2013-06-07 02:30:00    17.000
2013-06-07 02:40:00    17.015
2013-06-07 02:50:00    16.910
dtype: float64

Next solution not set index to Datetimeindex, use dt.hour and dt.minute and create new column newCol:

print df
                  Time  some value
0  2010-08-31 12:30:00      33.910
1  2010-08-31 12:40:00      33.250
2  2010-08-31 12:50:00      30.500
3  2010-08-31 13:00:00      27.065
4  2010-08-31 13:10:00      25.610
5  2013-06-07 02:10:00      16.970
6  2013-06-07 02:20:00      16.955
7  2013-06-07 02:30:00      17.000
8  2013-06-07 02:40:00      17.015
9  2013-06-07 02:50:00      16.910

#convert column time to datetime
df['Time'] = pd.to_datetime(df['Time'])
print df
                 Time  some value
0 2010-08-31 12:30:00      33.910
1 2010-08-31 12:40:00      33.250
2 2010-08-31 12:50:00      30.500
3 2010-08-31 13:00:00      27.065
4 2010-08-31 13:10:00      25.610
5 2013-06-07 02:10:00      16.970
6 2013-06-07 02:20:00      16.955
7 2013-06-07 02:30:00      17.000
8 2013-06-07 02:40:00      17.015
9 2013-06-07 02:50:00      16.910
df['newCol'] = df.groupby([df['Time'].dt.hour, df['Time'].dt.minute])['some value']
                 .transform('mean')
print df
                 Time  some value  newCol
0 2010-08-31 12:30:00      33.910  33.910
1 2010-08-31 12:40:00      33.250  33.250
2 2010-08-31 12:50:00      30.500  30.500
3 2010-08-31 13:00:00      27.065  27.065
4 2010-08-31 13:10:00      25.610  25.610
5 2013-06-07 02:10:00      16.970  16.970
6 2013-06-07 02:20:00      16.955  16.955
7 2013-06-07 02:30:00      17.000  17.000
8 2013-06-07 02:40:00      17.015  17.015
9 2013-06-07 02:50:00      16.910  16.910

Upvotes: 1

Related Questions