Reputation: 101
I'm working on a classification problem using a dataset containing 39 attributes (38 independent features + the class attribute). When I try to calculate the correlation matrix the class attribute is not taken into consideration. To my knowledge, it should be included in the matrix as well.
len(heartdata.columns)
39
Since the number of columns in my dataframe is 39 then the correlation matrix should be of shape (39,39) but what I get is:
cor = heartdata.corr()
cor.shape
(38, 38)
Upvotes: 2
Views: 1549
Reputation: 495
if your features are categorical, you should use LabelEncoding
from sklearn.preprocessing import LabelEncoder
train = train_df
label_encoder = LabelEncoder()
for i in range(len(train.columns)):
column = train_df.columns[i]
train[column] = label_encoder.fit_transform(train_df[column])
print(f"train {column} uniques: {len(train[column].unique())} ")
x = train
y = train_df['gender'].to_frame(name='gender')
Then you can get Correlation Matrix
:
cor = x.corr()
print(cor)
and if you want use plot to show correlation between features, I suggest heatmap
plot:
import matplotlib.pyplot as plt
import seaborn as sns
plt.figure(figsize=(10,8),linewidth=10,edgecolor="#04253a" )
sns.heatmap(cor, annot=True, cmap=plt.cm.Reds)
plt.show()
Upvotes: 2
Reputation: 101
My class attribute had a categorical type that's why corr() function didn't take it into consideration. A simple econding solved the problem.
le = LabelEncoder()
heartdata['class'] = le.fit_transform(heartdata['class'])
Upvotes: 1