Ali
Ali

Reputation: 8100

Pandas impute Null with average of previous and next value in the row

I have a dataframe with several Nulls scattered here and there. I want to impute the value of the Nulls with the average of previous and next value in the same row, but only for the nulls which have a numeric value on their left and right columns.

In the below example, I would like to impute only the Nan in the 2nd row and column c2

import pandas as pd
import numpy as np

df = pd.DataFrame([[1, 2, 3, 4], [5,np.nan,np.nan,8], [9,np.nan,11,np.nan]], columns=['c1', 'c2', 'c3', 'c4'])

Out

   c1   c2    c3   c4
0   1  2.0   3.0  4.0
1   5  NaN   NaN  8.0
2   9  NaN  11.0  NaN

Will becomes

   c1   c2    c3   c4
0   1  2.0   3.0  4.0
1   5  NaN   NaN  8.0
2   9  10  11.0  NaN   <-- value 10 replaces Nan

Upvotes: 2

Views: 3124

Answers (1)

jezrael
jezrael

Reputation: 862531

Use DataFrame.interpolate with parameters axis=1 for procesing per rows, limit_area='inside' for processing NaNs values surrounded by valid values and limit=1 for filled maximum 1.

There is also default parameter limit_direction='forward', so if more values of NaNs only first is replaced. You can find it by another interpolate with limit_direction='backward' and testing missing values used for replace after by DataFrame.mask:

mask = df.interpolate(axis=1, limit_area='inside', limit=1, limit_direction='backward').isna()
df1 = df.interpolate(axis=1, limit_area='inside', limit=1).mask(mask)
print (df1)
    c1    c2    c3   c4
0  1.0   2.0   3.0  4.0
1  5.0   NaN   NaN  8.0
2  9.0  10.0  11.0  NaN

Detail:

print (df.interpolate(axis=1, limit_area='inside', limit=1))
    c1    c2    c3   c4
0  1.0   2.0   3.0  4.0
1  5.0   6.0   NaN  8.0
2  9.0  10.0  11.0  NaN

print (df.interpolate(axis=1, limit_area='inside', limit=1, limit_direction='backward'))
    c1    c2    c3   c4
0  1.0   2.0   3.0  4.0
1  5.0   NaN   7.0  8.0
2  9.0  10.0  11.0  NaN

Similar idea is use back filling with limit=1 for test missing values:

mask = df.bfill(axis=1, limit=1).isna()
df1 = df.interpolate(axis=1, limit_area='inside', limit=1).mask(mask)

Detail:

print (df.bfill(axis=1, limit=1))
    c1    c2    c3   c4
0  1.0   2.0   3.0  4.0
1  5.0   NaN   8.0  8.0
2  9.0  11.0  11.0  NaN

Upvotes: 3

Related Questions