Reputation: 303
I have a dataset with some columns which I am using for grouping the database.I have some other numerical columns in the same dataset with some missing values. I want to fill the missing values of a column with the mean of the group in which the missing entry lies.
Name of Pandas dataset=data
Col on which groups would be based=['A','B']
Col that needs to be imputed with group based means: ['C']
Upvotes: 1
Views: 393
Reputation: 450
[df[i].fillna(df[i].mean(),inplace=True) for i in df.columns ]
This fills then NAN from column C with 5.8 which is the mean of columns 'C'
Output
print df
A B C
0 1 1 3.0
1 1 1 9.0
2 1 1 5.8
3 2 2 8.0
4 2 1 4.0
5 2 2 5.8
6 2 2 5.0
Upvotes: 0
Reputation: 863711
I think you can use groupby
with transform
:
import pandas as pd
import numpy as np
df = pd.DataFrame([[1,1,3],
[1,1,9],
[1,1,np.nan],
[2,2,8],
[2,1,4],
[2,2,np.nan],
[2,2,5]]
, columns=list('ABC'))
print df
A B C
0 1 1 3.0
1 1 1 9.0
2 1 1 NaN
3 2 2 8.0
4 2 1 4.0
5 2 2 NaN
6 2 2 5.0
df['C'] = df.groupby(['A', 'B'])['C'].transform(lambda x: x.fillna( x.mean() ))
print df
A B C
0 1 1 3.0
1 1 1 9.0
2 1 1 6.0
3 2 2 8.0
4 2 1 4.0
5 2 2 6.5
6 2 2 5.0
Upvotes: 2