Reputation: 7457
I am trying to build a simple function to fill the pandas columns with some distribution, but it fails to fill the whole table (df still have NaN after fillna ...)
def simple_impute_missing(df):
from numpy.random import normal
rnd_filled = pd.DataFrame( {c : normal(df[c].mean(), df[c].std(), len(df))
for c in df.columns[3:]})
filled_df = df.fillna(rnd_filled)
return filled_df
But the returned df, still have NaNs !
I have checked to make sure that rnd_filled is full and have the right shape. what is going on?
Upvotes: 0
Views: 1478
Reputation: 862641
I think you need remove [:3]
from df.columns[3:]
for select all columns of df
.
Sample:
df = pd.DataFrame({'A':[1,np.nan,3],
'B':[4,5,6],
'C':[np.nan,8,9],
'D':[1,3,np.nan],
'E':[5,np.nan,6],
'F':[7,np.nan,3]})
print (df)
A B C D E F
0 1.0 4 NaN 1.0 5.0 7.0
1 NaN 5 8.0 3.0 NaN NaN
2 3.0 6 9.0 NaN 6.0 3.0
rnd_filled = pd.DataFrame( {c : normal(df[c].mean(), df[c].std(), len(df))
for c in df.columns})
filled_df = df.fillna(rnd_filled)
print (filled_df)
A B C D E F
0 1.000000 4 6.922458 1.000000 5.000000 7.000000
1 2.277218 5 8.000000 3.000000 5.714767 6.245759
2 3.000000 6 9.000000 0.119522 6.000000 3.000000
Upvotes: 1