elnino
elnino

Reputation: 173

Replace NaN values in list of cols with mean of other list of cols

I have this dataframe with NaNs:

  Col1 Col2  Col3  Col4
0    A    c   1.0   2.0
1    A    c   3.0   6.0
2    B    c   5.0   NaN
3    A    d   6.0   4.0
4    A    c   NaN   1.0
5    A    d   NaN   3.0
6    B    c   5.0   4.0

I want to replace the missing values present in columns with labels in the list cols

The value to be replaced is computed as the mean of the non-missing values of the corresponding group. Groups are formed based on the values in the columns with labels in the list condition_cols

For example:

condition_cols = ['Col1', 'Col2']

cols = ['Col3', 'Col4']

After replacing the NaNs the dataframe would look like this:

  Col1 Col2  Col3  Col4
0    A    c   1.0   2.0
1    A    c   3.0   6.0
2    B    c   5.0   4.0
3    A    d   6.0   4.0
4    A    c   2.0   1.0
5    A    d   6.0   3.0
6    B    c   5.0   4.0

I tried taking the mean of duplicates of condition_cols but the calculated mean is for every group not specific to any group.

Upvotes: 3

Views: 136

Answers (1)

Shubham Sharma
Shubham Sharma

Reputation: 71689

You can groupby the dataframe on condition_cols and transform using mean, finally use fillna to fill the missing values:

df = df.fillna(df.groupby(condition_cols).transform('mean'))

Result:

  Col1 Col2  Col3  Col4
0    A    c   1.0   2.0
1    A    c   3.0   6.0
2    B    c   5.0   4.0
3    A    d   6.0   4.0
4    A    c   2.0   1.0
5    A    d   6.0   3.0
6    B    c   5.0   4.0

Upvotes: 6

Related Questions