Reputation: 96458
I am having a strange problem in Pandas. I have a Dataframe with several NaN
values. I thought I could fill those NaN
values using column means (that is, fill every NaN
value with its column mean) but when I try the following
col_means = mydf.apply(np.mean, 0)
mydf = mydf.fillna(value=col_means)
I still see some NaN
values. Why?
Is it because I have more NaN
values in my original dataframe than entries in col_means
? And what exactly is the difference between fill-by-column vs fill-by-row?
Upvotes: 5
Views: 9588
Reputation: 375905
You can just fillna
with the df.mean()
Series (which is dict-like):
In [11]: df = pd.DataFrame([[1, np.nan], [np.nan, 4], [5, 6]])
In [12]: df
Out[12]:
0 1
0 1 NaN
1 NaN 4
2 5 6
In [13]: df.fillna(df.mean())
Out[13]:
0 1
0 1 5
1 3 4
2 5 6
Note: that df.mean()
is the row-wise mean, which gives the fill values:
In [14]: df.mean()
Out[14]:
0 3
1 5
dtype: float64
Note: if df.mean()
has some NaN values then these will be used in the DataFrame's fillna, perhaps you want to use a fillna
on this Series i.e.
df.mean().fillna(0)
df.fillna(df.mean().fillna(0))
Upvotes: 5