Reputation: 657
I use method DataFrame.corr() from Pandas. As result it return matrix of correlation, but it removes columns where were even one Nan value. Is possible to compute correlation in DataFrame with Nan?
Upvotes: 1
Views: 3982
Reputation: 5015
You must first get rid of NaN values:
df2=df.dropna()
Or replace them by mean:
df2 = df.fillna(df.mean())
Or use an algorithm like EM (expectation maximization) for imputation.
Then you check for correlations
df2.corr()
Note: if the missing values rate of a given variable are bigger than 15%, you should consider dropping it from analysis
Upvotes: 1
Reputation: 28
Try this. For my case it worked
df = df.apply(pd.to_numeric, errors='coerce')
Upvotes: 1