Ryan
Ryan

Reputation: 640

Pandas fillna() not working as expected

I'm trying to replace NaN values in my dataframe with means from the same row.

sample_df = pd.DataFrame({'A':[1.0,np.nan,5.0],
                   'B':[1.0,4.0,5.0],
                   'C':[1.0,1.0,4.0],
                   'D':[6.0,5.0,5.0],
                   'E':[1.0,1.0,4.0],
                   'F':[1.0,np.nan,4.0]})

sample_mean = sample_df.apply(lambda x: np.mean(x.dropna().values.tolist()) ,axis=1)

Produces:

0    1.833333
1    2.750000
2    4.500000
dtype: float64

But when I try to use fillna() to fill the missing dataframe values with values from the series, it doesn't seem to work.

sample_df.fillna(sample_mean, inplace=True)

    A     B     C     D     E     F
0   1.0   1.0   1.0   6.0   1.0   1.0
1   NaN   4.0   1.0   5.0   1.0   NaN
2   5.0   5.0   4.0   5.0   4.0   4.0

What I expect is:

    A     B     C     D     E     F
0   1.0   1.0   1.0   6.0   1.0   1.0
1   2.75  4.0   1.0   5.0   1.0   2.75
2   5.0   5.0   4.0   5.0   4.0   4.0

I've reviewed the other similar questions and can't seem to uncover the issue. Thanks in advance for your help.

Upvotes: 2

Views: 3624

Answers (3)

Vidhya G
Vidhya G

Reputation: 2320

Another pandas way:

>>> sample_df.where(pd.notnull(sample_df), sample_df.mean(axis=1), axis='rows')
      A    B    C    D    E     F
0  1.00  1.0  1.0  6.0  1.0  1.00
1  2.75  4.0  1.0  5.0  1.0  2.75
2  5.00  5.0  4.0  5.0  4.0  4.00

An if condition is True is in operation here: Where elements of pd.notnull(sample_df) are True use the corresponding elements from sample_df else use the elements from sample_df.mean(axis=1) and perform this logic along axis='rows'.

Upvotes: 1

BENY
BENY

Reputation: 323226

By using pandas

sample_df.T.fillna(sample_df.T.mean()).T
Out[1284]: 
      A    B    C    D    E     F
0  1.00  1.0  1.0  6.0  1.0  1.00
1  2.75  4.0  1.0  5.0  1.0  2.75
2  5.00  5.0  4.0  5.0  4.0  4.00

Upvotes: 1

Divakar
Divakar

Reputation: 221564

Here's one way -

sample_df[:] = np.where(np.isnan(sample_df), sample_df.mean(1)[:,None], sample_df)

Sample output -

sample_df
Out[61]: 
      A    B    C    D    E     F
0  1.00  1.0  1.0  6.0  1.0  1.00
1  2.75  4.0  1.0  5.0  1.0  2.75
2  5.00  5.0  4.0  5.0  4.0  4.00

Upvotes: 1

Related Questions