Pandas fillna: Output still has NaN values

Question

I am having a strange problem in Pandas. I have a Dataframe with several NaN values. I thought I could fill those NaN values using column means (that is, fill every NaN value with its column mean) but when I try the following

  col_means = mydf.apply(np.mean, 0)
  mydf = mydf.fillna(value=col_means)

I still see some NaN values. Why?

Is it because I have more NaN values in my original dataframe than entries in col_means? And what exactly is the difference between fill-by-column vs fill-by-row?

Andy Hayden · Accepted Answer

You can just fillna with the df.mean() Series (which is dict-like):

In [11]: df = pd.DataFrame([[1, np.nan], [np.nan, 4], [5, 6]])

In [12]: df
Out[12]:
    0   1
0   1 NaN
1 NaN   4
2   5   6

In [13]: df.fillna(df.mean())
Out[13]:
   0  1
0  1  5
1  3  4
2  5  6

Note: that df.mean() is the row-wise mean, which gives the fill values:

In [14]: df.mean()
Out[14]:
0    3
1    5
dtype: float64

Note: if df.mean() has some NaN values then these will be used in the DataFrame's fillna, perhaps you want to use a fillna on this Series i.e.

df.mean().fillna(0)
df.fillna(df.mean().fillna(0))

Pandas fillna: Output still has NaN values

Answers (1)

Related Questions