Xalion
Xalion

Reputation: 657

How to determine correlation from dataframe with Nan?

I use method DataFrame.corr() from Pandas. As result it return matrix of correlation, but it removes columns where were even one Nan value. Is possible to compute correlation in DataFrame with Nan?

Upvotes: 1

Views: 3982

Answers (2)

razimbres
razimbres

Reputation: 5015

You must first get rid of NaN values:

df2=df.dropna()

Or replace them by mean:

df2 = df.fillna(df.mean())

Or use an algorithm like EM (expectation maximization) for imputation.

Then you check for correlations

df2.corr()

Note: if the missing values rate of a given variable are bigger than 15%, you should consider dropping it from analysis

Upvotes: 1

Karolina Cabaj
Karolina Cabaj

Reputation: 28

Try this. For my case it worked

 df = df.apply(pd.to_numeric, errors='coerce')

Upvotes: 1

Related Questions