Michael
Michael

Reputation: 13

Prevent pandas interpolate from extrapolating

I am trying to interpolate a some data containing NaN's. I would like to fill 1-3 consecutive NaN's, but I cannot figure out to do so with pd.interpolate()

data_chunk = np.array([np.nan, np.nan, np.nan, 4, 5, np.nan, np.nan, np.nan, np.nan, 10, np.nan, np.nan, np.nan, 14])
data_chunk = pd.DataFrame(data_chunk)[0]
print(data_chunk)
print(data_chunk.interpolate(method='linear', limit_direction='both', limit=3, limit_area='inside'))

Original data:

0      NaN
1      NaN
2      NaN
3      4.0
4      5.0
5      NaN
6      NaN
7      NaN
8      NaN
9     10.0
10     NaN
11     NaN
12     NaN
13    14.0

Attempt at interpolating:

0      NaN
1      NaN
2      NaN
3      4.0
4      5.0
5      6.0
6      7.0
7      8.0
8      9.0
9     10.0
10    11.0
11    12.0
12    13.0
13    14.0

Expected result:

0      NaN
1      NaN
2      NaN
3      4.0
4      5.0
5      NaN
6      NaN
7      NaN
8      NaN
9     10.0
10    11.0
11    12.0
12    13.0
13    14.0

Any help would be appreciated :)

Upvotes: 1

Views: 51

Answers (1)

Rik Kraan
Rik Kraan

Reputation: 586

Create a boolean mask to see which NA-groups have less than 4 consecutive NA's.

mask = (data_chunk.notnull() != data_chunk.shift().notnull()).cumsum().reset_index().groupby(0).transform('count') < 4

Select interpolated values if mask == True and otherwise keep the original values.

pd.concat([interpolated[mask.values[:,0] ==True], data_chunk[mask.values[:,0] == False]]).sort_index()

Upvotes: 2

Related Questions