vijaya lakshmi
vijaya lakshmi

Reputation: 515

Correlation Matrix in pandas showing only few columns

I have a dataframe with the following columns. enter image description here When I do correlation matrix, I see only the columns that are of int data types. I am new to ML, Can someone guide me what is the mistake I am doing here ?

enter image description here

Upvotes: 1

Views: 3238

Answers (3)

Python16367225
Python16367225

Reputation: 131

Convert the non-numeric numbers to numeric values using pd.to_numeric.

df = df.apply([pd.to_numeric])

Also, convert all categorical data such as city name to dummy variables that can be used to compute correlation, as is done in this thread. Essentially, all the data you want to compute correlation on needs to be either a float or integer, preferably all one or the other, otherwise, you're likely to have problems.

Upvotes: 1

Celius Stingher
Celius Stingher

Reputation: 18367

As you correctly observe and @Kraigolas states from the docs

numeric_onlybool, default True Include only float, int or boolean data.

Meaning that by default will only compute values from numerical columns. You can change this by using:

df.corr(numeric_only=False)

However, this means pandas will try to converte the values to float to perform the correlation, but if the values in the columns are not numerical, it will fail returning:

ValueError: could not convert string to float: 'X'

Upvotes: 2

Kraigolas
Kraigolas

Reputation: 5560

From the docs, by default numeric_only is set to True in the corr function. You need to set it to False so it compares non numeric columns. Observe that the columns in your final results were the only ones with numeric dtypes.

This behaviour is deprecated though: in future versions of pandas, numeric_only will be set to False.

Upvotes: 1

Related Questions