Gab
Gab

Reputation: 31

Pandas interpolate with new time scale

I have a time series with timestamps recorded any time an event occurs (so no given frequency). The precision of the timestamp being the millisecond. As this goes on for several thousands of lines and with many variables, I would like to create a new time range with a given frequency (here '5ms') and interpolate values at those time. So I tried this:

import pandas as pd

a = pd.DataFrame({"Time":pd.to_datetime(['2016-01-23 00:00:00.001',
'2016-01-23 00:00:00.013','2016-01-23 00:00:00.018',
'2016-01-23 00:00:00.024']),
"Value": [1,2,3,4]})

a = a.set_index(a["Time"])
b = pd.date_range(start='2016-01-23 00:00:00.00',
  end='2016-01-23 00:00:00.025', freq='5ms')
c = a.reindex(b).interpolate(method="time")
>>                        Time  Value
  2016-01-23 00:00:00.000  NaT    NaN
  2016-01-23 00:00:00.005  NaT    NaN
  2016-01-23 00:00:00.010  NaT    NaN
  2016-01-23 00:00:00.015  NaT    NaN
  2016-01-23 00:00:00.020  NaT    NaN
  2016-01-23 00:00:00.025  NaT    NaN
d=a.resample('5ms').interpolate()
>>                        Time  Value                             
  2016-01-23 00:00:00.000  NaT    NaN
  2016-01-23 00:00:00.005  NaT    NaN
  2016-01-23 00:00:00.010  NaT    NaN
  2016-01-23 00:00:00.015  NaT    NaN
  2016-01-23 00:00:00.020  NaT    NaN

I guess that none of these solutions could work if the new time scale does not include the previous timestamps in it? I ended up going around this problem as follows:

e = a.reindex(a.index.union(b)).interpolate(method='time').reindex(b)
>>                        Time     Value
  2016-01-23 00:00:00.000  NaT       NaN
  2016-01-23 00:00:00.005  NaT  1.333333
  2016-01-23 00:00:00.010  NaT  1.749995
  2016-01-23 00:00:00.015  NaT  2.400031
  2016-01-23 00:00:00.020  NaT  3.333348
  2016-01-23 00:00:00.025  NaT  4.000000

But this looks heavy and not efficient to me. I would have expected that to be possible directly with the interpolation function. Any ideas?

Upvotes: 2

Views: 1453

Answers (1)

PdevG
PdevG

Reputation: 3677

It is possible to use resample, if you set the time on the index.

a = pd.DataFrame({"Time":pd.to_datetime(['2016-01-23 00:00:00.001',
   '2016-01-23 00:00:00.013','2016-01-23 00:00:00.018',
   '2016-01-23 00:00:00.024']),
   "Value": [1,2,3,4]})

a.set_index('Time', inplace=True)    

print(a.resample('1ms').interpolate().resample('5ms').first())

output:

                        Value
Time                             
2016-01-23 00:00:00.000  1.000000
2016-01-23 00:00:00.005  1.333333
2016-01-23 00:00:00.010  1.750000
2016-01-23 00:00:00.015  2.400000
2016-01-23 00:00:00.020  3.333333

Still a bit workaround-ish. But it's something!

Resampling to 5 microseconds straight away gives a more coarse interpolation:

print(a.resample('5ms').first().interpolate())

                         Value
Time                          
2016-01-23 00:00:00.000    1.0
2016-01-23 00:00:00.005    1.5
2016-01-23 00:00:00.010    2.0
2016-01-23 00:00:00.015    3.0
2016-01-23 00:00:00.020    4.0

Upvotes: 1

Related Questions