Dror Hilman
Dror Hilman

Reputation: 7457

pandas fillna() not working properly

I am trying to build a simple function to fill the pandas columns with some distribution, but it fails to fill the whole table (df still have NaN after fillna ...)

def simple_impute_missing(df):
    from numpy.random import normal
    rnd_filled = pd.DataFrame( {c : normal(df[c].mean(), df[c].std(), len(df))
                                  for c in df.columns[3:]})

    filled_df = df.fillna(rnd_filled)       
    return filled_df

But the returned df, still have NaNs !

I have checked to make sure that rnd_filled is full and have the right shape. what is going on?

Upvotes: 0

Views: 1478

Answers (1)

jezrael
jezrael

Reputation: 862641

I think you need remove [:3] from df.columns[3:] for select all columns of df.

Sample:

df = pd.DataFrame({'A':[1,np.nan,3],
                   'B':[4,5,6],
                   'C':[np.nan,8,9],
                   'D':[1,3,np.nan],
                   'E':[5,np.nan,6],
                   'F':[7,np.nan,3]})

print (df)
     A  B    C    D    E    F
0  1.0  4  NaN  1.0  5.0  7.0
1  NaN  5  8.0  3.0  NaN  NaN
2  3.0  6  9.0  NaN  6.0  3.0

rnd_filled = pd.DataFrame( {c : normal(df[c].mean(), df[c].std(), len(df))
                                  for c in df.columns})

filled_df = df.fillna(rnd_filled)       
print (filled_df)
          A  B         C         D         E         F
0  1.000000  4  6.922458  1.000000  5.000000  7.000000
1  2.277218  5  8.000000  3.000000  5.714767  6.245759
2  3.000000  6  9.000000  0.119522  6.000000  3.000000

Upvotes: 1

Related Questions