Maryam Mirfazaelian
Maryam Mirfazaelian

Reputation: 47

Filling NaN in each row with the mean of existing values except its NaNs

I am a beginner in the basics of data analysis and have a problem that I could not solve it with my current knowledge and searching on various sites, especially the StackOverflow site. I have a sample data frame like the one below and I need to fill NaN with a specific condition:

df = pd.DataFrame([['row 1', 1, 2, 3, 4], ['row 2', 5, nan, 7, 8], ['row 3', nan, nan, 11, 12]], index=['A', 'B', 'C'], columns=['rows', '1st', '2nd', '3th', '4th'])
df
rows 1st 2nd 3rd 4th
A 1.0 2.0 3 4
B 5.0 NaN >> fill with (5.0+7+8) / 3 = 6.66... 7 8
C NaN >> fill with (11+12) / 2 = 11.5 NaN >> fill with (11+12) / 2 = 11.5 11 12

In each row that contains NaN, I have some numbers. To fill NaN I have to calculate their values with this formula: (sum of existing numbers in the row) / (number of existing numbers in the row). For example in row 'B', the NaN should fill with the shown formula. Thank you in advance for any advice.

I tried this code in notebook but my data frame didn't changed:

df.fillna((df[df.isna() == False].sum(axis=1)) / (4 - df.isna().sum(axis=1)), inplace=True)

Upvotes: 2

Views: 126

Answers (2)

Anoushiravan R
Anoushiravan R

Reputation: 21908

You could also try the following solution:

import pandas as pd

df.where(~ df.isna(), df.iloc[:, 1:].mean(axis=1), axis=0)

    rows   1st        2nd  3th  4th
A  row 1   1.0   2.000000    3    4
B  row 2   5.0   6.666667    7    8
C  row 3  11.5  11.500000   11   12

Or you could also try thanks to a very good suggestion by @Corralien:

df.where(~ df.isna(), df.mean(axis=1, numeric_only=True), axis=0)

Upvotes: 3

Corralien
Corralien

Reputation: 120399

You can compute the mean along the column axis then fill nan values:

mean = df.mean(axis=1, numeric_only=True)
out = df.T.fillna(mean).T

Output:

>>> out
    rows   1st       2nd 3th 4th
A  row 1   1.0       2.0   3   4
B  row 2   5.0  6.666667   7   8
C  row 3  11.5      11.5  11  12

>>> mean
A     2.500000
B     6.666667
C    11.500000
dtype: float64

Update: a shortest solution inspired by @anoushiravan-r:

>>> df.mask(df.isna(), df.mean(axis=1, numeric_only=True), axis=0)
    rows   1st        2nd  3th  4th
A  row 1   1.0   2.000000    3    4
B  row 2   5.0   6.666667    7    8
C  row 3  11.5  11.500000   11   12

Upvotes: 4

Related Questions