Gabe Spradlin
Gabe Spradlin

Reputation: 2057

Python - Zero-Order Hold Interpolation (Nearest Neighbor)

I will be shocked if there isn't some standard library function for this especially in numpy or scipy but no amount of Googling is providing a decent answer.

I am getting data from the Poloniex exchange - cryptocurrency. Think of it like getting stock prices - buy and sell orders - pushed to your computer. So what I have is timeseries of prices for any given market. One market might get an update 10 times a day while another gets updated 10 times a minute - it all depends on how many people are buying and selling on the market.

So my timeseries data will end up being something like:

[1 0.0003234,
 1.01 0.0003233,
 10.0004 0.00033,
 124.23 0.0003334,
 ...]

Where the 1st column is the time value (I use Unix timestamps to the microsecond but didn't think that was necessary in the example. The 2nd column would be one of the prices - either the buy or sell prices.

What I want is to convert it into a matrix where the data is "sampled" at a regular time frame. So the interpolated (zero-order hold) matrix would be:

[1 0.0003234,
 2 0.0003233,
 3 0.0003233,
 ...
 10 0.0003233,
 11 0.00033,
 12 0.00033,
 13 0.00033,
 ...
 120 0.00033,
 125 0.0003334,
 ...]

I want to do this with any reasonable time step. Right now I use np.linspace(start_time, end_time, time_step) to create the new time vector.

Writing my own, admittedly crude, zero-order hold interpolator won't be that hard. I'll loop through the original time vector and use np.nonzero to find all the indices in the new time vector which fit between one timestamp (t0) and the next (t1) then fill in those indices with the value from time t0.

For now, the crude method will work. The matrix of prices isn't that big. But I have to think there a faster method using one of the built-in libraries. I just can't find it.

Also, for the example above I only use a matrix of Nx2 (column 1: times, column 2: price) but ultimately the market has 6 or 8 different parameters that might get updated. A method/library function that could handled multiple prices and such in different columns would be great.

Python 3.5 via Anaconda on Windows 7 (hopefully won't matter).

TIA

Upvotes: 3

Views: 6715

Answers (1)

jotasi
jotasi

Reputation: 5177

For your problem you can use scipy.interpolate.interp1d. It seems to be able to do everything that you want. It is able to do a zero order hold interpolation if you specify kind="zero". It can also simultaniously interpolate multiple columns of a matrix. You will just have to specify the appropriate axis. f = interp1d(xData, yDataColumns, kind='zero', axis=0) will then return a function that you can evaluate at any point in the interpolation range. You can then get your normalized data by calling f(np.linspace(start_time, end_time, time_step).

Upvotes: 6

Related Questions