Benmo751
Benmo751

Reputation: 83

How to limit pandas interpolation when there is more NaN than the limit

I would like to interpolate a pandas dataframe but I could not find a simple solution to my issue. Here is my pandas df:

df = pandas.DataFrame(numpy.array([numpy.nan, 1, 1, numpy.nan, \
    numpy.nan, 1, numpy.nan, numpy.nan, numpy.nan, numpy.nan, 1]),columns=['a'])

      a
0   NaN
1   1.0
2   1.0
3   NaN
4   NaN
5   1.0
6   NaN
7   NaN
8   NaN
9   NaN
10  1.0

I want the following result:

      a
0   NaN
1   1.0
2   1.0
3   1.0
4   1.0
5   1.0
6   NaN
7   NaN
8   NaN
9   NaN
10  1.0

To do that, I use the interpolate function but it always extrapolate the data:

df2=df.interpolate(limit=2, limit_area='inside' ,method='linear')
      a
0   NaN
1   1.0
2   1.0
3   1.0
4   1.0
5   1.0
6   1.0
7   1.0
8   NaN
9   NaN
10  1.0

Is it possible to interpolate ONLY if there is a non NaN value in the range of the limit parameter? (ie: The value for line 6 and 7 should be NaN because line 8 is also a NaN). I have the impression that limit_area is only for the end of the df.

Thanks

Upvotes: 7

Views: 1878

Answers (1)

ALollz
ALollz

Reputation: 59549

To achieve what you want first create a Series that counts the number of consecutive NaN values for each group of NaN and broadcasts that value back to each row in the group. Then interpolate the entire Series and use mask to NaN everything that shouldn't have been interpolated.

s = df['a'].notnull()
s = s.ne(s.shift()).cumsum()

m = df.groupby([s, df['a'].isnull()])['a'].transform('size').where(df['a'].isnull())
#0     1.0
#1     NaN
#2     NaN
#3     2.0
#4     2.0
#5     NaN
#6     4.0
#7     4.0
#8     4.0
#9     4.0
#10    NaN

df.interpolate(limit_area='inside', method='linear').mask(m>2)

      a
0   NaN
1   1.0
2   1.0
3   1.0
4   1.0
5   1.0
6   NaN
7   NaN
8   NaN
9   NaN
10  1.0

Upvotes: 5

Related Questions