Reputation: 1130
I have a time series that looks like:
timeseries1 = [{'price': 250, 'time': 1.52},
{'price': 251, 'time': 3.65},
{'price': 253, 'time': 10.1},
{'price': 254, 'time': 10.99}]
I want to be able to interpolate this data so that it moves forward in small timesteps, and have something like:
timeStep = 0.1
timeseries2 = [{'price': 250, 'time': 1.5},
{'price': 250, 'time': 1.6},
{'price': 250, 'time': 1.7},
...
{'price': 250, 'time': 3.6},
{'price': 251, 'time': 3.7},
{'price': 251, 'time': 3.8},
{'price': 251, 'time': 3.9},
...
{'price': 251, 'time': 10.0},
{'price': 253, 'time': 10.1},
{'price': 253, 'time': 10.2},
{'price': 253, 'time': 10.3},
...
{'price': 253, 'time': 10.9},
{'price': 254, 'time': 11.0}]
I'm really unsure of how to do this efficiently and hope there will be a nice pythonic way to do so. What I've tried doing is iterating through timeseries1, with a while loop to append new values to the end of timeseries2, but this seems very inefficient having 2 nested loops.
Edit: Here is the code/algorithm currently being used to do this.
startTime = math.floor(timeseries1[0]['time'] / timeStep) * timeStep
oldPrice = timeseries1[0]['price']
timeseries3 = []
timeseries3.append(timeseries1[0])
timeseries3[0]['time'] = startTime
for x in timeseries1[1:]:
while startTime < x['time']:
timeseries3.append({'price': oldPrice, 'time': startTime})
startTime += timeStep
oldPrice = x['price']
So that timeseries3 will be the same as timeseries2 in the end.
Upvotes: 0
Views: 1548
Reputation: 94
Try to use RedBlackPy. RedBlackPy.Series class built on red-black trees for convenient work with time series, it has interpolation methods which built into getitem operator(Series[key]).
import redblackpy as rb
time = [1.52, 3.65, 10.1, 10.99]
price = [250, 251, 253, 254]
# create Series with 'floor' interpolation
# your case, in time t you need last known value
series = rb.Series( index=time, values=price, dtype='float64',
interpolate='floor' )
# now you can access at any key with no insertion using interpolation
# and can create new series with necessary time step
# args in uniform method: (start, end, step)
new_series = series.uniform(1.5, 11, 0.1)
# required result!
print(new_series)
Output of the last print is following (with problems of float arithmetic):
Series object Untitled
1.5: 0.0
1.6: 250.0
1.7000000000000002: 250.0
1.8000000000000003: 250.0
1.9000000000000004: 250.0
2.0000000000000004: 250.0
2.1000000000000005: 250.0
...
9.89999999999998: 251.0
9.99999999999998: 251.0
10.09999999999998: 251.0
10.19999999999998: 253.0
10.29999999999998: 253.0
10.399999999999979: 253.0
10.499999999999979: 253.0
10.599999999999978: 253.0
10.699999999999978: 253.0
10.799999999999978: 253.0
10.899999999999977: 253.0
10.999999999999977: 254.0
Remind, using interpolation you have access at any key! You don't have to create new series if you just want to iterate over it with uniform time step. You can do it with RedBlackPy.Series with no additional memory:
import redblackpy as rb
# create iterator for time
def grid_generator(start, stop, step):
it = start - step
while it <= stop:
it += step
yield it
time = [1.52, 3.65, 10.1, 10.99]
price = [250, 251, 253, 254]
# create Series with 'floor' interpolation
# your case, in time t you need last known value
series = rb.Series( index=time, values=price, dtype='float64',
interpolate='floor' )
# ok, now we iterate over our Series (with 4 elements!)
for key in grid_generator(1.6, 11, 0.1):
print(series[key]) # prints last known value (your case)
Upvotes: 1
Reputation: 4455
...hope there will be a nice pythonic way to do so.
Here's a pythonic way of generating a list: using a generator! However, I must admit that the following code has issues:
def timeseries( t1, t2, p1, coeff, step ):
t = t1
while t <= t2:
yield { 'price' : int( p1 + ( t - t1 ) * coeff), 'time' : t }
t += step
print list(timeseries( 1.5, 11 , 250 , 0.43 , 0.1 ) )
So, the generator might be a "fun" way to create your time series. However, it needs work due to the floating arithmetic problems I'm seeing when I run it:
[{'price': 250, 'time': 1.5}, {'price': 250, 'time': 1.6}, {'price': 250, 'time': 1.7000000000000002}, {'price': 250, 'time': 1.8000000000000003}, {'price': 250, 'time': 1.9000000000000004}, {'price': 250, 'time': 2.0000000000000004}, {'price': 250, 'time': 2.1000000000000005}, {'price': 250, 'time': 2.2000000000000006}, {'price': 250, 't...
While I think that the above code is easy to read ( well, the variable names could have been more descriptive and perhaps maybe a comment or two would have been nice ) here is an even tighter piece of python code that accomplishes the same thing. Instead of declaring a generator function, it uses an anonymous generator to accomplish the same thing.
For completeness, I've added a line to figure out the slope of the data to perform the interpolation.
(t1,p1,t2,p2) = ( 1.52 , 250.0 , 10.99, 254.0 )
coeff = ( p2 - p1) / ( t2 - t1 )
print list( { 'time' : i/10.0, 'price' : int (i/10.0*coeff * 100 ) / 100 + p1 } for i in range(int( t1 * 10 ) , int( t2 * 10 )))
The code could be generalized even further. The 10.0 and 100 values are in there to perform integer math and keep only the significant digits that we care about. This is cleaner than the previous code that had the time value get very wonky just by adding the step of 0.1 to the current time t ( t += step ). This site talks about using an frange generator built on decimal.Decimal. In my 2.7 python environment, I couldn't get that to work properly, so I just hard coded the scale/significant digits into the formula ( again, not very general ).
Upvotes: 0