Reputation: 870
I have a pandas dataframe with a column of speeds in KmH, and a column of timestamps:
Date, Speed
2016-07-07 13:38:02.000, 50.718590
2016-07-18 11:28:00.000, 2.357645
2016-07-15 15:03:08.000, 14.652172
2016-07-18 06:53:00.000, 24.530390
... ...
2016-07-18 18:41:31.000, 31.761416
2016-07-14 05:28:42.187, 7.532758
What I want is to have a harmonic average speed for each 15 minutes in an average day:
Time, Speed
00:00, 32
00:15, 10
00:30, 12
00:45, 41
01:00, 12
...
23:30, 30
23:45, 31
My initial attempt was to remove the dates from each timestamp, set it as the index, and use TimeGrouper to find the average. (My dataframe is called output) The code was:
output['Speed'] = output['Speed']**-1
output['Date'] = output['Date'].apply( lambda d : d.time() )
output = output.set_index(['Date'])
output = output.groupby(pd.TimeGrouper('15Min')).mean()
output['Speed'] = output['Speed']**-1
The code didn't work however as it gave me an error:
Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Int64Index'
Upvotes: 2
Views: 581
Reputation: 394179
I think what you want to do is normalise the dates, then you do the resampling:
In [177]:
df['Date'] = pd.to_datetime(df['Date'].dt.strftime('%H:%M:%S'))
df
Out[177]:
Date Speed
0 2017-02-07 13:38:02 50.718590
1 2017-02-07 11:28:00 2.357645
2 2017-02-07 15:03:08 14.652172
3 2017-02-07 06:53:00 24.530390
4 2017-02-07 18:41:31 31.761416
5 2017-02-07 05:28:42 7.532758
now all dates are same, by default today's date, then do what you want:
In [178]:
output = df.set_index('Date')
output = output.groupby(pd.TimeGrouper('15Min')).mean()
output['Speed'] = output['Speed']**-1
output
Out[178]:
Speed
Date
2017-02-07 05:15:00 0.132754
2017-02-07 05:30:00 NaN
2017-02-07 05:45:00 NaN
2017-02-07 06:00:00 NaN
2017-02-07 06:15:00 NaN
2017-02-07 06:30:00 NaN
2017-02-07 06:45:00 0.040766
2017-02-07 07:00:00 NaN
2017-02-07 07:15:00 NaN
2017-02-07 07:30:00 NaN
2017-02-07 07:45:00 NaN
2017-02-07 08:00:00 NaN
2017-02-07 08:15:00 NaN
2017-02-07 08:30:00 NaN
2017-02-07 08:45:00 NaN
2017-02-07 09:00:00 NaN
2017-02-07 09:15:00 NaN
2017-02-07 09:30:00 NaN
2017-02-07 09:45:00 NaN
2017-02-07 10:00:00 NaN
2017-02-07 10:15:00 NaN
2017-02-07 10:30:00 NaN
2017-02-07 10:45:00 NaN
2017-02-07 11:00:00 NaN
2017-02-07 11:15:00 0.424152
2017-02-07 11:30:00 NaN
2017-02-07 11:45:00 NaN
2017-02-07 12:00:00 NaN
2017-02-07 12:15:00 NaN
2017-02-07 12:30:00 NaN
2017-02-07 12:45:00 NaN
2017-02-07 13:00:00 NaN
2017-02-07 13:15:00 NaN
2017-02-07 13:30:00 0.019717
2017-02-07 13:45:00 NaN
2017-02-07 14:00:00 NaN
2017-02-07 14:15:00 NaN
2017-02-07 14:30:00 NaN
2017-02-07 14:45:00 NaN
2017-02-07 15:00:00 0.068249
2017-02-07 15:15:00 NaN
2017-02-07 15:30:00 NaN
2017-02-07 15:45:00 NaN
2017-02-07 16:00:00 NaN
2017-02-07 16:15:00 NaN
2017-02-07 16:30:00 NaN
2017-02-07 16:45:00 NaN
2017-02-07 17:00:00 NaN
2017-02-07 17:15:00 NaN
2017-02-07 17:30:00 NaN
2017-02-07 17:45:00 NaN
2017-02-07 18:00:00 NaN
2017-02-07 18:15:00 NaN
2017-02-07 18:30:00 0.031485
so this:
df['Date'] = pd.to_datetime(df['Date'].dt.strftime('%H:%M:%S'))
what this does is to extract as a string just the time using dt.strftime
, we can then use to_datetime
to make a datetime64 series where all dates are the same
Upvotes: 2