Reputation: 83
I have a raw dataset that looks like this:
df = pd.DataFrame({'speed': [66.8,67,67.1,70,69],
'time': ['2017-08-09T05:41:30.168Z', '2017-08-09T05:41:31.136Z', '2017-08-09T05:41:31.386Z', '2017-08-09T05:41:31.103Z','2017-08-09T05:41:35.563Z' ]})
I could do some processing on it to make it look like (removed microseconds):
df['time']= pd.to_datetime(df.time)
df['time'] = df['time'].apply(lambda x: x.replace(microsecond=0))
>>> df
speed time
0 66.8 2017-08-09 05:41:30
1 67.0 2017-08-09 05:41:31
2 67.1 2017-08-09 05:41:31
3 70.0 2017-08-09 05:41:31
4 69.0 2017-08-09 05:41:35
I need to now resample the data so that any entries that arrived at the same timestamp are averaged together, and for the timestamps that did not receive any data, use the last available value. Like:
speed time
0 66.80 2017-08-09 05:41:30
1 68.03 2017-08-09 05:41:31
2 70.00 2017-08-09 05:41:32
3 70.00 2017-08-09 05:41:33
4 70.00 2017-08-09 05:41:34
5 69.00 2017-08-09 05:41:35
I understand this might involve the use of groupby and resample, but being a beginner I find myself struggling with these. Any ideas on how to proceed?
I have tried this but I am getting wrong results:
df.groupby( [df["time"].dt.second]).mean()
speed
time
30 66.800000
31 68.033333
35 69.000000
Upvotes: 8
Views: 6124
Reputation: 210952
In [279]: df.resample('1S', on='time').mean().ffill()
Out[279]:
speed
time
2017-08-09 05:41:30 66.800000
2017-08-09 05:41:31 68.033333
2017-08-09 05:41:32 68.033333
2017-08-09 05:41:33 68.033333
2017-08-09 05:41:34 68.033333
2017-08-09 05:41:35 69.000000
Upvotes: 11