Reputation: 8100
I have a dataframe with several Nulls scattered here and there. I want to impute the value of the Nulls with the average of previous and next value in the same row, but only for the nulls which have a numeric value on their left and right columns.
In the below example, I would like to impute only the Nan in the 2nd row and column c2
import pandas as pd
import numpy as np
df = pd.DataFrame([[1, 2, 3, 4], [5,np.nan,np.nan,8], [9,np.nan,11,np.nan]], columns=['c1', 'c2', 'c3', 'c4'])
Out
c1 c2 c3 c4
0 1 2.0 3.0 4.0
1 5 NaN NaN 8.0
2 9 NaN 11.0 NaN
Will becomes
c1 c2 c3 c4
0 1 2.0 3.0 4.0
1 5 NaN NaN 8.0
2 9 10 11.0 NaN <-- value 10 replaces Nan
Upvotes: 2
Views: 3124
Reputation: 862531
Use DataFrame.interpolate
with parameters axis=1
for procesing per rows, limit_area='inside'
for processing NaN
s values surrounded by valid values and limit=1
for filled maximum 1
.
There is also default parameter limit_direction='forward'
, so if more values of NaN
s only first is replaced. You can find it by another interpolate
with limit_direction='backward'
and testing missing values used for replace after by DataFrame.mask
:
mask = df.interpolate(axis=1, limit_area='inside', limit=1, limit_direction='backward').isna()
df1 = df.interpolate(axis=1, limit_area='inside', limit=1).mask(mask)
print (df1)
c1 c2 c3 c4
0 1.0 2.0 3.0 4.0
1 5.0 NaN NaN 8.0
2 9.0 10.0 11.0 NaN
Detail:
print (df.interpolate(axis=1, limit_area='inside', limit=1))
c1 c2 c3 c4
0 1.0 2.0 3.0 4.0
1 5.0 6.0 NaN 8.0
2 9.0 10.0 11.0 NaN
print (df.interpolate(axis=1, limit_area='inside', limit=1, limit_direction='backward'))
c1 c2 c3 c4
0 1.0 2.0 3.0 4.0
1 5.0 NaN 7.0 8.0
2 9.0 10.0 11.0 NaN
Similar idea is use back filling with limit=1
for test missing values:
mask = df.bfill(axis=1, limit=1).isna()
df1 = df.interpolate(axis=1, limit_area='inside', limit=1).mask(mask)
Detail:
print (df.bfill(axis=1, limit=1))
c1 c2 c3 c4
0 1.0 2.0 3.0 4.0
1 5.0 NaN 8.0 8.0
2 9.0 11.0 11.0 NaN
Upvotes: 3