Linear interpolation of NaN's in pandas dataframe based on its two neighbors (above & below or left & right)

Question

I have a data frame with one column of float values and timestamp as index similar to below. All the timestamp in the index are sorted in ascending order.

timestamp           value
2014-01-08 08:00:42 1
2014-01-08 08:01:00 NaN
2014-01-08 08:01:12 3
2014-01-08 08:01:52 7
2014-01-08 08:02:02 9

How do I linearly interpolate/Impute the NaN value based "only" on 1 and 3 value(i.e above and below value). The data frame is huge and can run up to 2 GB. So speed is important for this operation

Thanks

Brad Solomon · Accepted Answer

Edit: you updated your question to interpolate based on the timestamp differences in the index, applying those proportions to your DataFrame values.

Here's a crude way of going about that that should actually be decently quick:

ts = pd.to_datetime(['2014-01-08 08:00:42',
                     '2014-01-08 08:01:00',
                     '2014-01-08 08:01:12',
                     '2014-01-08 08:01:52',
                     '2014-01-08 08:02:02'])

df = pd.DataFrame([1, np.nan, 3, 7, 9], index=ts, columns=['value'])
td = df.index.to_series().diff().dt.total_seconds()
interp = df.value.shift(1) + (df.value.shift(-1) - df.value.shift(1)) \
       * td / (td.shift(-1) + td)

df['value'] = df['value'].fillna(interp)
print(df)
                     value
2014-01-08 08:00:42    1.0
2014-01-08 08:01:00    2.2
2014-01-08 08:01:12    3.0
2014-01-08 08:01:52    7.0
2014-01-08 08:02:02    9.0

In this example, the NaN at the second position has timedelta differences that are 18 units below and 12 units above, respectively. Therefore it should be filled with the value 1 + (3 - 1) * (18 / (18 + 12)).

You may be able to simplify that math as well.

One other thing: While working on this answer, I asked a question of my own about calculating the timedeltas. @piRSquared has a faster solution here if speed matters to that extent to you.

Linear interpolation of NaN's in pandas dataframe based on its two neighbors (above & below or left & right)

Answers (1)

Related Questions

Linear interpolation of NaN&#39;s in pandas dataframe based on its two neighbors (above &amp; below or left &amp; right)

Answers (1)

Related Questions

Linear interpolation of NaN's in pandas dataframe based on its two neighbors (above & below or left & right)