How to fill missing data in a data frame based on grouped objects?

Question

I have a dataset with some columns which I am using for grouping the database.I have some other numerical columns in the same dataset with some missing values. I want to fill the missing values of a column with the mean of the group in which the missing entry lies.

    Name of Pandas dataset=data
    Col on which groups would be based=['A','B']
    Col that needs to be imputed with group based means: ['C']

jezrael · Accepted Answer

I think you can use groupby with transform:

import pandas as pd
import numpy as np

df = pd.DataFrame([[1,1,3],
                   [1,1,9],
                   [1,1,np.nan],
                   [2,2,8],
                   [2,1,4],
                   [2,2,np.nan],
                   [2,2,5]]
                   , columns=list('ABC'))
print df
   A  B    C
0  1  1  3.0
1  1  1  9.0
2  1  1  NaN
3  2  2  8.0
4  2  1  4.0
5  2  2  NaN
6  2  2  5.0

df['C'] = df.groupby(['A', 'B'])['C'].transform(lambda x: x.fillna( x.mean() ))
print df
   A  B    C
0  1  1  3.0
1  1  1  9.0
2  1  1  6.0
3  2  2  8.0
4  2  1  4.0
5  2  2  6.5
6  2  2  5.0

How to fill missing data in a data frame based on grouped objects?

Answers (2)

Related Questions