Reputation: 13
I am trying to interpolate a some data containing NaN's. I would like to fill 1-3 consecutive NaN's, but I cannot figure out to do so with pd.interpolate()
data_chunk = np.array([np.nan, np.nan, np.nan, 4, 5, np.nan, np.nan, np.nan, np.nan, 10, np.nan, np.nan, np.nan, 14])
data_chunk = pd.DataFrame(data_chunk)[0]
print(data_chunk)
print(data_chunk.interpolate(method='linear', limit_direction='both', limit=3, limit_area='inside'))
Original data:
0 NaN
1 NaN
2 NaN
3 4.0
4 5.0
5 NaN
6 NaN
7 NaN
8 NaN
9 10.0
10 NaN
11 NaN
12 NaN
13 14.0
Attempt at interpolating:
0 NaN
1 NaN
2 NaN
3 4.0
4 5.0
5 6.0
6 7.0
7 8.0
8 9.0
9 10.0
10 11.0
11 12.0
12 13.0
13 14.0
Expected result:
0 NaN
1 NaN
2 NaN
3 4.0
4 5.0
5 NaN
6 NaN
7 NaN
8 NaN
9 10.0
10 11.0
11 12.0
12 13.0
13 14.0
Any help would be appreciated :)
Upvotes: 1
Views: 51
Reputation: 586
Create a boolean mask to see which NA-groups
have less than 4 consecutive NA's
.
mask = (data_chunk.notnull() != data_chunk.shift().notnull()).cumsum().reset_index().groupby(0).transform('count') < 4
Select interpolated values if mask == True
and otherwise keep the original values.
pd.concat([interpolated[mask.values[:,0] ==True], data_chunk[mask.values[:,0] == False]]).sort_index()
Upvotes: 2