BobbyJohnsonOG
BobbyJohnsonOG

Reputation: 141

Interpolating backwards with multiple consecutive nan's in Pandas/Python?

I have an array with missing values in various places.

import numpy as np
import pandas as pd
x = np.arange(1,10).astype(float)
x[[0,1,6]] = np.nan
df = pd.Series(x)
print(df)

0    NaN
1    NaN
2    3.0
3    4.0
4    5.0
5    6.0
6    NaN
7    8.0
8    9.0
dtype: float64

For each NaN, I want to take the value proceeding it, an divide it by two. And then propogate that to the next consecutive NaN, so I would end up with:

0    0.75
1    1.5
2    3.0
3    4.0
4    5.0
5    6.0
6    4.0
7    8.0
8    9.0
dtype: float64

I've tried df.interpolate(), but that doesn't seem to work with consecutive NaN's.

Upvotes: 3

Views: 1496

Answers (2)

jezrael
jezrael

Reputation: 862901

Another solution with fillna with method ffill, what it same as ffill() function:

#back order of Series
b = df[::-1].isnull()
#find all consecutives NaN, count them, divide by 2 and replace 0 to 1
a = (b.cumsum() - b.cumsum().where(~b).ffill()).mul(2).replace({0:1})

print(a)
8    1
7    1
6    2
5    1
4    1
3    1
2    1
1    2
0    4
dtype: int32

print(df.bfill().div(a))
0    0.75
1    1.50
2    3.00
3    4.00
4    5.00
5    6.00
6    4.00
7    8.00
8    9.00
dtype: float64

Timings (len(df)=9k):

In [315]: %timeit (mat(df))
100 loops, best of 3: 11.3 ms per loop

In [316]: %timeit (jez(df1))
100 loops, best of 3: 2.52 ms per loop

Code for timings:

import numpy as np
import pandas as pd
x = np.arange(1,10).astype(float)
x[[0,1,6]] = np.nan
df = pd.Series(x)
print(df)
df = pd.concat([df]*1000).reset_index(drop=True)
df1 = df.copy()

def jez(df):
    b = df[::-1].isnull()
    a = (b.cumsum() - b.cumsum().where(~b).ffill()).mul(2).replace({0:1})
    return (df.bfill().div(a))

def mat(df):
    prev = 0
    new_list = []
    for i in df.values[::-1]:
        if np.isnan(i):
            new_list.append(prev/2.)    
            prev = prev / 2.
        else:
            new_list.append(i)
            prev = i
    return pd.Series(new_list[::-1])

print (mat(df))
print (jez(df1))

Upvotes: 3

Mathias711
Mathias711

Reputation: 6658

You can do something like this:

import numpy as np
import pandas as pd
x = np.arange(1,10).astype(float)
x[[0,1,6]] = np.nan
df = pd.Series(x)

prev = 0
new_list = []
for i in df.values[::-1]:
    if np.isnan(i):
        new_list.append(prev/2.)    
        prev = prev / 2.
    else:
        new_list.append(i)
        prev = i
df = pd.Series(new_list[::-1])

It loops over the values of the df, in reverse. It keeps track of the previous value. It adds the actual value if it is not NaN, otherwise the half of the previous value.

This might not be the most sophisticated Pandas solution, but you can change the behavior quite easy.

Upvotes: 2

Related Questions