Reputation: 31
I have a time series with timestamps recorded any time an event occurs (so no given frequency). The precision of the timestamp being the millisecond. As this goes on for several thousands of lines and with many variables, I would like to create a new time range with a given frequency (here '5ms') and interpolate values at those time. So I tried this:
import pandas as pd
a = pd.DataFrame({"Time":pd.to_datetime(['2016-01-23 00:00:00.001',
'2016-01-23 00:00:00.013','2016-01-23 00:00:00.018',
'2016-01-23 00:00:00.024']),
"Value": [1,2,3,4]})
a = a.set_index(a["Time"])
b = pd.date_range(start='2016-01-23 00:00:00.00',
end='2016-01-23 00:00:00.025', freq='5ms')
c = a.reindex(b).interpolate(method="time")
>> Time Value
2016-01-23 00:00:00.000 NaT NaN
2016-01-23 00:00:00.005 NaT NaN
2016-01-23 00:00:00.010 NaT NaN
2016-01-23 00:00:00.015 NaT NaN
2016-01-23 00:00:00.020 NaT NaN
2016-01-23 00:00:00.025 NaT NaN
d=a.resample('5ms').interpolate()
>> Time Value
2016-01-23 00:00:00.000 NaT NaN
2016-01-23 00:00:00.005 NaT NaN
2016-01-23 00:00:00.010 NaT NaN
2016-01-23 00:00:00.015 NaT NaN
2016-01-23 00:00:00.020 NaT NaN
I guess that none of these solutions could work if the new time scale does not include the previous timestamps in it? I ended up going around this problem as follows:
e = a.reindex(a.index.union(b)).interpolate(method='time').reindex(b)
>> Time Value
2016-01-23 00:00:00.000 NaT NaN
2016-01-23 00:00:00.005 NaT 1.333333
2016-01-23 00:00:00.010 NaT 1.749995
2016-01-23 00:00:00.015 NaT 2.400031
2016-01-23 00:00:00.020 NaT 3.333348
2016-01-23 00:00:00.025 NaT 4.000000
But this looks heavy and not efficient to me. I would have expected that to be possible directly with the interpolation function. Any ideas?
Upvotes: 2
Views: 1453
Reputation: 3677
It is possible to use resample, if you set the time on the index.
a = pd.DataFrame({"Time":pd.to_datetime(['2016-01-23 00:00:00.001',
'2016-01-23 00:00:00.013','2016-01-23 00:00:00.018',
'2016-01-23 00:00:00.024']),
"Value": [1,2,3,4]})
a.set_index('Time', inplace=True)
print(a.resample('1ms').interpolate().resample('5ms').first())
output:
Value
Time
2016-01-23 00:00:00.000 1.000000
2016-01-23 00:00:00.005 1.333333
2016-01-23 00:00:00.010 1.750000
2016-01-23 00:00:00.015 2.400000
2016-01-23 00:00:00.020 3.333333
Still a bit workaround-ish. But it's something!
Resampling to 5 microseconds straight away gives a more coarse interpolation:
print(a.resample('5ms').first().interpolate())
Value
Time
2016-01-23 00:00:00.000 1.0
2016-01-23 00:00:00.005 1.5
2016-01-23 00:00:00.010 2.0
2016-01-23 00:00:00.015 3.0
2016-01-23 00:00:00.020 4.0
Upvotes: 1