Ved Gupta
Ved Gupta

Reputation: 303

How to fill missing data in a data frame based on grouped objects?

I have a dataset with some columns which I am using for grouping the database.I have some other numerical columns in the same dataset with some missing values. I want to fill the missing values of a column with the mean of the group in which the missing entry lies.

    Name of Pandas dataset=data
    Col on which groups would be based=['A','B']
    Col that needs to be imputed with group based means: ['C'] 

Upvotes: 1

Views: 393

Answers (2)

5nv
5nv

Reputation: 450

[df[i].fillna(df[i].mean(),inplace=True) for i in df.columns ]

This fills then NAN from column C with 5.8 which is the mean of columns 'C'

Output    
    print df  
      A  B    C
    0  1  1  3.0
    1  1  1  9.0
    2  1  1  5.8
    3  2  2  8.0
    4  2  1  4.0
    5  2  2  5.8
    6  2  2  5.0  

Upvotes: 0

jezrael
jezrael

Reputation: 863711

I think you can use groupby with transform:

import pandas as pd
import numpy as np

df = pd.DataFrame([[1,1,3],
                   [1,1,9],
                   [1,1,np.nan],
                   [2,2,8],
                   [2,1,4],
                   [2,2,np.nan],
                   [2,2,5]]
                   , columns=list('ABC'))
print df
   A  B    C
0  1  1  3.0
1  1  1  9.0
2  1  1  NaN
3  2  2  8.0
4  2  1  4.0
5  2  2  NaN
6  2  2  5.0

df['C'] = df.groupby(['A', 'B'])['C'].transform(lambda x: x.fillna( x.mean() ))
print df
   A  B    C
0  1  1  3.0
1  1  1  9.0
2  1  1  6.0
3  2  2  8.0
4  2  1  4.0
5  2  2  6.5
6  2  2  5.0

Upvotes: 2

Related Questions