alejo
alejo

Reputation: 137

How to calculate the correlation of all features with the target variable (binary classifier, python 3)?

I want to calculate in python the correlation of all my features (all of float type) and the class label (Binary, 0 or 1). In addition, I would like to plot the data to visualize their distribution by class.

This is needed so I can find features coupled to a single label and find out their real importance. Note that I don't want the pairwise feature correlation and that my classifier is binary.

I have tried the following (from a similar post in stackoverflow) but it is not exactly what I am looking for.

df.drop("Target", axis=1).apply(lambda x: x.corr(df.Target)) 

Please see in the picture attached how the distribution would look like for one the features (from Weka).

Class distribution for one of the features Class distribution for one of the features

Any feedback is really appreciated.

Upvotes: 4

Views: 7745

Answers (1)

Venkatachalam
Venkatachalam

Reputation: 16966

Correlation is not supposed to be used for categorical variables. For more explanation see here

You can understand the relationship between your independent variables and target variables with the following approach.

from sklearn.datasets import load_breast_cancer
data  = load_breast_cancer(return_X_y=False)

import pandas as pd

df=pd.DataFrame(data.data[:,:5])
df.columns = data.feature_names[:5]

df['target'] = data.target.astype(str)

import seaborn as sns;
import matplotlib.pyplot as plt
g= sns.pairplot(df,hue = 'target', diag_kind= 'hist',
             vars=df.columns[:-1],
             plot_kws=dict(alpha=0.5), 
             diag_kws=dict(alpha=0.5))
plt.show()

enter image description here

Upvotes: 8

Related Questions