houda benhar
houda benhar

Reputation: 101

Correlation matrix in pandas doesn't take some column into consideration

I'm working on a classification problem using a dataset containing 39 attributes (38 independent features + the class attribute). When I try to calculate the correlation matrix the class attribute is not taken into consideration. To my knowledge, it should be included in the matrix as well.

len(heartdata.columns)
39

Since the number of columns in my dataframe is 39 then the correlation matrix should be of shape (39,39) but what I get is:

cor = heartdata.corr()
cor.shape
(38, 38)

Upvotes: 2

Views: 1549

Answers (2)

Pooya Chavoshi
Pooya Chavoshi

Reputation: 495

if your features are categorical, you should use LabelEncoding

from sklearn.preprocessing import LabelEncoder

train = train_df   
label_encoder = LabelEncoder()

for i in range(len(train.columns)):
    column = train_df.columns[i]
    train[column] = label_encoder.fit_transform(train_df[column])
    print(f"train {column} uniques: {len(train[column].unique())} ")

x = train
y = train_df['gender'].to_frame(name='gender')

Then you can get Correlation Matrix:

cor = x.corr()
print(cor)

and if you want use plot to show correlation between features, I suggest heatmap plot:

import matplotlib.pyplot as plt
import seaborn as sns
plt.figure(figsize=(10,8),linewidth=10,edgecolor="#04253a" )
sns.heatmap(cor, annot=True, cmap=plt.cm.Reds)
plt.show()

Output: enter image description here

Upvotes: 2

houda benhar
houda benhar

Reputation: 101

My class attribute had a categorical type that's why corr() function didn't take it into consideration. A simple econding solved the problem.

le = LabelEncoder()
heartdata['class'] = le.fit_transform(heartdata['class'])

Upvotes: 1

Related Questions