s nandan
s nandan

Reputation: 109

Pandas DataFrame : Using Pandas Replace NaN Values with Average of above 3 rows

I am having Pandas DataFrame in that I have to replace NaN values from the Order_Values column by applying mean for the above 3 rows where NaN values are present by groupby Id column

Input Dataframe   
        Id     order_values
       1002       45
       1002       36
       1002       18
       1002       NaN    
       1002       NaN   
       1002       72
       1003       68
       1003       54
       1003       45
       1003       NaN
       1003       NaN
       1004       14
       1004       50
       1004       27
       1004       NaN


Output Dataframe   
        Id     order_values
       1002       45
       1002       36
       1002       18
       1002       33    [Calculation 18+36+45=99/3= 33 ]
       1002       29   [Calculation 33+18+36=99/3= 29 ]
       1002       72
       1003       68
       1003       54
       1003       45
       1003       55.6  [Calculation 45+54+68=167/3= 55.6  ]
       1003       51.53  [Calculation 45+54+68=154.6/3= 51.53 ]
       1004       14
       1004       50
       1004       27
       1004       30.33   [Calculation 27+50+14=91/3= 30.33 ]

Upvotes: 2

Views: 142

Answers (3)

SomeDude
SomeDude

Reputation: 14238

You can do like this:

s = df['order_values'].copy()
for i in range(3, len(s)):
    s.iloc[i] = s.iloc[i-3:i].mean() if pd.isna(s.iloc[i]) else s.iloc[i]

df['order_values'] = s

print(df):

      Id  order_values
0   1002     45.000000
1   1002     36.000000
2   1002     18.000000
3   1002     33.000000
4   1002     29.000000
5   1002     72.000000
6   1003     68.000000
7   1003     54.000000
8   1003     45.000000
9   1003     55.666667
10  1003     51.555556
11  1004     14.000000
12  1004     50.000000
13  1004     27.000000
14  1004     30.333333

If you want it do it by groupby Id then you could write the above lines in a function and use groupby and transform like:

def fill_na_in_order_values(s):
    for i in range(3, len(s)):
        s.iloc[i] = s.iloc[i-3:i].mean() if pd.isna(s.iloc[i]) else s.iloc[i]
    return s

df['order_values'] = df.groupby('Id')['order_values'].transform(fill_na_in_order_values)

Upvotes: 2

Manualmsdos
Manualmsdos

Reputation: 1547

Something like this?:

df = pd.DataFrame({
    'id':[ 1002 ,1002 ,1002 ,1002, 1002,1002 ,1003 ,1003 ,1003 ,1003, 1003, 1004, 1004, 1004, 1004],
    'order_values': [45, 36, 18, None ,None, 72, 68, 54, 45, None, None, 14, 50, 27, None]
})


seq = list(range(len(df)))
window_size = 3

df['result'] = df['order_values']
df['seq'] = None

for i in range(len(df) - window_size):
    if pd.isna(df.iloc[i + window_size, 1]):
        df.iloc[i + window_size, 2] = df.iloc[i: i + window_size, 2].mean()
        
        df.iloc[i + window_size, 3] = str(seq[i: i + window_size])

df:

    id  order_values    result  seq
0   1002    45.0    45.000000   None
1   1002    36.0    36.000000   None
2   1002    18.0    18.000000   None
3   1002    NaN     33.000000   [0, 1, 2]
4   1002    NaN     29.000000   [1, 2, 3]
5   1002    72.0    72.000000   None
6   1003    68.0    68.000000   None
7   1003    54.0    54.000000   None
8   1003    45.0    45.000000   None
9   1003    NaN     55.666667   [6, 7, 8]
10  1003    NaN     51.555556   [7, 8, 9]
11  1004    14.0    14.000000   None
12  1004    50.0    50.000000   None
13  1004    27.0    27.000000   None
14  1004    NaN     30.333333   [11, 12, 13]

Upvotes: 0

Jason Leaver
Jason Leaver

Reputation: 314

you could try using shift

import pandas as pd
import numpy as np

df = pd.DataFrame(
    {
        "id": [1002, 1002, 1002, 1002, 1002, 1002, 1003, 1003, 1003, 1003, 1003, 1004, 1004, 1004, 1004],
        "order_values": [45, 36, 18, None, None, 72, 68, 54, 45, None, None, 14, 50, 27, None],
    }
)
order = df["order_values"]
mask = order.isnull()

a = np.array([order[~mask].shift(periods=i) for i in range(1, 4)]).T[3:-2]

df.loc[mask, "order_values"] = np.sum(a, axis=1)
print(df)
      id  order_values
0   1002          45.0
1   1002          36.0
2   1002          18.0
3   1002          99.0
4   1002         126.0
5   1002          72.0
6   1003          68.0
7   1003          54.0
8   1003          45.0
9   1003         158.0
10  1003         194.0
11  1004          14.0
12  1004          50.0
13  1004          27.0
14  1004         167.0

Upvotes: 0

Related Questions