Correlation matrix does not show all columns python

I am trying to solve the "House Prices" challenge from Kaggle and I'm stuck on my correlation matrix because it simply doesn't show all columns I want. Initially, it was obviously because of the large number of columns, so I did this:

df = df_data[['SalePrice', 'MSSubClass', 'MSZoning', 'LotFrontage', 'LotArea', 'Street', 'Alley', 'LotShape', 'LandContour', 'Utilities']].copy()    

corrmax = df.corr()

f, ax = plt.subplots(figsize=(16,12))
sns.heatmap(corrmax, annot = True)

And then, the result is a heatmap with only SalePrice, MSSubClass, LotFrontage and LotArea for some reason. Can anyone please help me?

Upvotes: 4

Views: 8889

Answers (1)

Sohaib Aslam
Sohaib Aslam

Reputation: 1315

If you analysis the dataset of House Prices House Prices there are about 21-23 categorical variables 'MSZoning','Alley' The corr() matrix only show their relationship between the numerical values or non-categorical variables

corrmax = df.corr()

If you want to find the relation between the categorical and non-categorical variables use need to use the Spearman correlation matrix

You will find some help from the links below...

An overview of correlation measures between categorical and continuous variables

Correlation between a nominal (IV) and a continuous (DV) variable

Upvotes: 4

Related Questions