Reputation: 173
I have this dataframe with NaNs:
Col1 Col2 Col3 Col4
0 A c 1.0 2.0
1 A c 3.0 6.0
2 B c 5.0 NaN
3 A d 6.0 4.0
4 A c NaN 1.0
5 A d NaN 3.0
6 B c 5.0 4.0
I want to replace the missing values present in columns with labels in the list cols
The value to be replaced is computed as the mean of the non-missing values of the corresponding group. Groups are formed based on the values in the columns with labels in the list condition_cols
For example:
condition_cols = ['Col1', 'Col2']
cols = ['Col3', 'Col4']
After replacing the NaNs the dataframe would look like this:
Col1 Col2 Col3 Col4
0 A c 1.0 2.0
1 A c 3.0 6.0
2 B c 5.0 4.0
3 A d 6.0 4.0
4 A c 2.0 1.0
5 A d 6.0 3.0
6 B c 5.0 4.0
I tried taking the mean of duplicates of condition_cols
but the calculated mean is for every group not specific to any group.
Upvotes: 3
Views: 136
Reputation: 71689
You can groupby
the dataframe on condition_cols
and transform
using mean
, finally use fillna
to fill the missing values:
df = df.fillna(df.groupby(condition_cols).transform('mean'))
Result:
Col1 Col2 Col3 Col4
0 A c 1.0 2.0
1 A c 3.0 6.0
2 B c 5.0 4.0
3 A d 6.0 4.0
4 A c 2.0 1.0
5 A d 6.0 3.0
6 B c 5.0 4.0
Upvotes: 6