Avoid interpolation problems with numpy float

Question

I need to handle timespans in a library I am creating. My first idea was to keep it simple and codify them as years, with float.

The problems arise, for instance, when I wish to perform interpolations. Say I have

xs = np.array([0, 0.7, 1.2, 3.0])  # times
ys = np.array([np.nan, 124.3, 214.0, np.nan])  # values associated

Outside the [0.7, 1.2] interval I would like to get the value np.nan, but inside, the obvious linear interpolation, particularly in the extremes.

However, using

#!/usr/bin/python3.5

import numpy as np
from fractions import Fraction

import scipy.interpolate as scInt

if __name__ == "__main__":
    xs = np.array([0, 0.7, 1.2, 3.0])  # times
    ys = np.array([np.nan, 124.3, 214.0, np.nan])  # values associated
    interp = scInt.interp1d(xs, ys)
    xsInt = np.array([0, 7/10, 6/5-0.0001, 6/5, 6/5+0.0001])
    print(interp(xsInt))

I get

[nan, 124.3, 213.98206, nan, nan]

So, the correct value for 7/10, but a nan for 6/5, which is 1.2. There is no mystery in this, machine representation of floats can cause things like this. But anyway it is an issue I need to deal with.

My first idea was to double the values in fs, so that I would interpolate in [x1-eps, x1+eps, x2-eps, x2+eps, ..., xn-eps, xn+eps], repeating twice the ys vector: [y1, y1, y2, y2, y3, y3, ..., yn, yn]. This works, but it is quite ugly. Then I though I would use fractions.Fraction instead, but Numpy complained saying that "object arrays are not supported". A pity, this seemed the way to go, although surely there would be a loss of performance.

There is another side of this problem: it would be nice to be able to create dictionaries where the key is a time of the same kind, and I fear when I search using a float as a key the same, some searches would fail due to the same issue.

My last idea was to use dates, like datetime.date, but I an not too happy with it because of the ambiguity when converting the difference between dates to year fractions.

What would be the best approach for this, is there a nice solution?

Konstantin Schubert · Accepted Answer

I think there is just no easy way out of this. Floats are fundamentally not suitable to be checked for equality, and by evaluating your interpolation on the edges of its domain (or using floats as keys in dictionaries), you are doing exactly this.

Your solution using epsilons is a bit hacky, but honestly there probably is no more elegant way of working around this problem.

In general, having to check floats for equality can be a symptom of a bad design choice. You recognized this, because you mentioned that you were thinking of using datetime.date. (Which I agree, is overkill.)

The best way to go is to accept that the interpolation is not defined on the edges of its domain and to work this assumption into the design of the program. The exact solution then depends on what you want to do.

Did you consider using seconds or days instead of years? Maybe by using seconds, you can avoid querying your interpolation at the borders of its definition range? If you only use integer values of seconds, you can easily use them as keys in your dictionary.

Avoid interpolation problems with numpy float

Answers (1)

Related Questions