Garvey
Garvey

Reputation: 1309

Fill missing value by averaging previous row value

I want to fill missing value with the average of previous N row value, example is shown below:

N=2
df = pd.DataFrame([[np.nan, 2, np.nan, 0],
                    [3, 4, np.nan, 1],
                    [np.nan, np.nan, np.nan, 5],
                    [np.nan, 3, np.nan, np.nan]],
                    columns=list('ABCD'))

DataFrame is like:

     A   B   C  D
0   NaN 2.0 NaN 0
1   3.0 4.0 NaN 1
2   NaN NaN NaN 5
3   NaN 3.0 NaN NaN

Result should be:

     A   B       C  D
0   NaN 2.0     NaN 0
1   3.0 4.0     NaN 1
2   NaN (4+2)/2 NaN 5
3   NaN 3.0     NaN (1+5)/2

I am wondering if there is elegant and fast way to achieve this without for loop.

Upvotes: 7

Views: 833

Answers (1)

jpp
jpp

Reputation: 164623

rolling + mean + shift

You will need to modify the below logic to interpret the mean of NaN and another value, in the case where one of the previous two values are null.

df = df.fillna(df.rolling(2).mean().shift())

print(df)

     A    B   C    D
0  NaN  2.0 NaN  0.0
1  3.0  4.0 NaN  1.0
2  NaN  3.0 NaN  5.0
3  NaN  3.0 NaN  3.0

Upvotes: 6

Related Questions