mojo1mojo2
mojo1mojo2

Reputation: 1130

Python Timeseries interpolation

I have a time series that looks like:

timeseries1 = [{'price': 250, 'time': 1.52},
    {'price': 251, 'time': 3.65},
    {'price': 253, 'time': 10.1},
    {'price': 254, 'time': 10.99}]

I want to be able to interpolate this data so that it moves forward in small timesteps, and have something like:

timeStep = 0.1
timeseries2 = [{'price': 250, 'time': 1.5},
    {'price': 250, 'time': 1.6},
    {'price': 250, 'time': 1.7},
    ...
    {'price': 250, 'time': 3.6},
    {'price': 251, 'time': 3.7},
    {'price': 251, 'time': 3.8},
    {'price': 251, 'time': 3.9},
    ...
    {'price': 251, 'time': 10.0},
    {'price': 253, 'time': 10.1},
    {'price': 253, 'time': 10.2},
    {'price': 253, 'time': 10.3},
    ...
    {'price': 253, 'time': 10.9},
    {'price': 254, 'time': 11.0}]

I'm really unsure of how to do this efficiently and hope there will be a nice pythonic way to do so. What I've tried doing is iterating through timeseries1, with a while loop to append new values to the end of timeseries2, but this seems very inefficient having 2 nested loops.

Edit: Here is the code/algorithm currently being used to do this.

startTime = math.floor(timeseries1[0]['time'] / timeStep) * timeStep
oldPrice = timeseries1[0]['price']
timeseries3 = []
timeseries3.append(timeseries1[0])
timeseries3[0]['time'] = startTime
for x in timeseries1[1:]:
    while startTime < x['time']:
        timeseries3.append({'price': oldPrice, 'time': startTime})
        startTime += timeStep
    oldPrice = x['price']

So that timeseries3 will be the same as timeseries2 in the end.

Upvotes: 0

Views: 1548

Answers (2)

Try to use RedBlackPy. RedBlackPy.Series class built on red-black trees for convenient work with time series, it has interpolation methods which built into getitem operator(Series[key]).

import redblackpy as rb

time = [1.52, 3.65, 10.1, 10.99]
price = [250, 251, 253, 254]
# create Series with 'floor' interpolation 
# your case, in time t you need last known value
series = rb.Series( index=time, values=price, dtype='float64',
                    interpolate='floor' )
# now you can access at any key with no insertion using interpolation
# and can create new series with necessary time step
# args in uniform method: (start, end, step)
new_series = series.uniform(1.5, 11, 0.1)
# required result!
print(new_series)

Output of the last print is following (with problems of float arithmetic):

Series object Untitled
1.5: 0.0
1.6: 250.0
1.7000000000000002: 250.0
1.8000000000000003: 250.0
1.9000000000000004: 250.0
2.0000000000000004: 250.0
2.1000000000000005: 250.0
...
9.89999999999998: 251.0
9.99999999999998: 251.0
10.09999999999998: 251.0
10.19999999999998: 253.0
10.29999999999998: 253.0
10.399999999999979: 253.0
10.499999999999979: 253.0
10.599999999999978: 253.0
10.699999999999978: 253.0
10.799999999999978: 253.0
10.899999999999977: 253.0
10.999999999999977: 254.0

Remind, using interpolation you have access at any key! You don't have to create new series if you just want to iterate over it with uniform time step. You can do it with RedBlackPy.Series with no additional memory:

 import redblackpy as rb

 # create iterator for time
 def grid_generator(start, stop, step):

     it = start - step

     while it <= stop:
         it += step
         yield it

  time = [1.52, 3.65, 10.1, 10.99]
  price = [250, 251, 253, 254]
  # create Series with 'floor' interpolation 
  # your case, in time t you need last known value
  series = rb.Series( index=time, values=price, dtype='float64',
                      interpolate='floor' )

  # ok, now we iterate over our Series (with 4 elements!)
  for key in grid_generator(1.6, 11, 0.1):
      print(series[key]) # prints last known value (your case)

Upvotes: 1

Mark
Mark

Reputation: 4455

...hope there will be a nice pythonic way to do so.

Here's a pythonic way of generating a list: using a generator! However, I must admit that the following code has issues:

def timeseries( t1, t2, p1, coeff, step ):
  t = t1
  while t <= t2:
    yield { 'price' :  int( p1 + ( t - t1 ) * coeff), 'time' : t }
    t += step


print list(timeseries( 1.5, 11 , 250 , 0.43 , 0.1 ) )

So, the generator might be a "fun" way to create your time series. However, it needs work due to the floating arithmetic problems I'm seeing when I run it:

[{'price': 250, 'time': 1.5}, {'price': 250, 'time': 1.6}, {'price': 250, 'time': 1.7000000000000002}, {'price': 250, 'time': 1.8000000000000003}, {'price': 250, 'time': 1.9000000000000004}, {'price': 250, 'time': 2.0000000000000004}, {'price': 250, 'time': 2.1000000000000005}, {'price': 250, 'time': 2.2000000000000006}, {'price': 250, 't...

While I think that the above code is easy to read ( well, the variable names could have been more descriptive and perhaps maybe a comment or two would have been nice ) here is an even tighter piece of python code that accomplishes the same thing. Instead of declaring a generator function, it uses an anonymous generator to accomplish the same thing.

For completeness, I've added a line to figure out the slope of the data to perform the interpolation.

(t1,p1,t2,p2) = ( 1.52 , 250.0 , 10.99, 254.0 ) 
coeff = ( p2 - p1) / ( t2  - t1 ) 
print  list( { 'time' : i/10.0, 'price' :  int (i/10.0*coeff * 100 ) / 100   + p1  } for i in range(int( t1 * 10 ) , int( t2 * 10 )))

The code could be generalized even further. The 10.0 and 100 values are in there to perform integer math and keep only the significant digits that we care about. This is cleaner than the previous code that had the time value get very wonky just by adding the step of 0.1 to the current time t ( t += step ). This site talks about using an frange generator built on decimal.Decimal. In my 2.7 python environment, I couldn't get that to work properly, so I just hard coded the scale/significant digits into the formula ( again, not very general ).

Upvotes: 0

Related Questions