Reputation: 951
I have data from several days (or even weeks) taken at exactly the same time intervals each day and want to compute an average day time curve. I tried daily average so far, but than I had one average value for each day... What I need is one value averaged over all available days at each available time. Most probably really easy knowing the right commands. Unfortunately I am quite new to pandas. Even just a hint where to look in the documentation would be great!
Time some value
2010-08-31 12:30:00 33.910
2010-08-31 12:40:00 33.250
2010-08-31 12:50:00 30.500
2010-08-31 13:00:00 27.065
2010-08-31 13:10:00 25.610
...
2013-06-07 02:10:00 16.970
2013-06-07 02:20:00 16.955
2013-06-07 02:30:00 17.000
2013-06-07 02:40:00 17.015
2013-06-07 02:50:00 16.910
Upvotes: 1
Views: 961
Reputation: 862731
You can try groupby
by hours
and minutes
and transform
mean
:
print df
Time some value
0 2010-08-31 12:30:00 33.910
1 2010-08-31 12:40:00 33.250
2 2010-08-31 12:50:00 30.500
3 2010-08-31 13:00:00 27.065
4 2010-08-31 13:10:00 25.610
5 2013-06-07 02:10:00 16.970
6 2013-06-07 02:20:00 16.955
7 2013-06-07 02:30:00 17.000
8 2013-06-07 02:40:00 17.015
9 2013-06-07 02:50:00 16.910
#convert column time to datetime
df['Time'] = pd.to_datetime(df['Time'])
#set index from column Time
df = df.set_index('Time')
print df
some value
Time
2010-08-31 12:30:00 33.910
2010-08-31 12:40:00 33.250
2010-08-31 12:50:00 30.500
2010-08-31 13:00:00 27.065
2010-08-31 13:10:00 25.610
2013-06-07 02:10:00 16.970
2013-06-07 02:20:00 16.955
2013-06-07 02:30:00 17.000
2013-06-07 02:40:00 17.015
2013-06-07 02:50:00 16.910
print df.groupby([df.index.hour, df.index.minute])['some value'].transform('mean')
Time
2010-08-31 12:30:00 33.910
2010-08-31 12:40:00 33.250
2010-08-31 12:50:00 30.500
2010-08-31 13:00:00 27.065
2010-08-31 13:10:00 25.610
2013-06-07 02:10:00 16.970
2013-06-07 02:20:00 16.955
2013-06-07 02:30:00 17.000
2013-06-07 02:40:00 17.015
2013-06-07 02:50:00 16.910
dtype: float64
Next solution not set index
to Datetimeindex
, use dt.hour
and dt.minute
and create new column newCol
:
print df
Time some value
0 2010-08-31 12:30:00 33.910
1 2010-08-31 12:40:00 33.250
2 2010-08-31 12:50:00 30.500
3 2010-08-31 13:00:00 27.065
4 2010-08-31 13:10:00 25.610
5 2013-06-07 02:10:00 16.970
6 2013-06-07 02:20:00 16.955
7 2013-06-07 02:30:00 17.000
8 2013-06-07 02:40:00 17.015
9 2013-06-07 02:50:00 16.910
#convert column time to datetime
df['Time'] = pd.to_datetime(df['Time'])
print df
Time some value
0 2010-08-31 12:30:00 33.910
1 2010-08-31 12:40:00 33.250
2 2010-08-31 12:50:00 30.500
3 2010-08-31 13:00:00 27.065
4 2010-08-31 13:10:00 25.610
5 2013-06-07 02:10:00 16.970
6 2013-06-07 02:20:00 16.955
7 2013-06-07 02:30:00 17.000
8 2013-06-07 02:40:00 17.015
9 2013-06-07 02:50:00 16.910
df['newCol'] = df.groupby([df['Time'].dt.hour, df['Time'].dt.minute])['some value']
.transform('mean')
print df
Time some value newCol
0 2010-08-31 12:30:00 33.910 33.910
1 2010-08-31 12:40:00 33.250 33.250
2 2010-08-31 12:50:00 30.500 30.500
3 2010-08-31 13:00:00 27.065 27.065
4 2010-08-31 13:10:00 25.610 25.610
5 2013-06-07 02:10:00 16.970 16.970
6 2013-06-07 02:20:00 16.955 16.955
7 2013-06-07 02:30:00 17.000 17.000
8 2013-06-07 02:40:00 17.015 17.015
9 2013-06-07 02:50:00 16.910 16.910
Upvotes: 1