asmgx
asmgx

Reputation: 8014

correlation for classification in python

I have Python dataframe df

It has multiple columns

Salary  Dept      Approve
1500    IT        Yes
1200    Finance   No
1200    IT        No
1300    HR        Yes
1800    Finance   No
1100    Finance   No
1600    Finance   No
1500    IT        Yes
1200    HR        Yes
1500    HR        Yes

I want to find the relation between Salary/Approve and Dept/Approve

Correlation is not working as some are classification not numerical

What other options do I have? How can I find the correlation between Salary/Approve and Dept/Approve

Upvotes: 1

Views: 376

Answers (1)

TayTay
TayTay

Reputation: 7170

One way you can do this is by converting the categorical variables to dummies, and then computing correlations against each of them:

dummies = pd.get_dummies(x)

From there it's easy to compute correlations between whatever combinations you like:

>>> dummies.corr()
                Salary  Dept_Finance   Dept_HR   Dept_IT  Approve_No  Approve_Yes
Salary        1.000000      0.134865 -0.175072  0.030895   -0.047193     0.047193
Dept_Finance  0.134865      1.000000 -0.534522 -0.534522    0.816497    -0.816497
Dept_HR      -0.175072     -0.534522  1.000000 -0.428571   -0.654654     0.654654
Dept_IT       0.030895     -0.534522 -0.428571  1.000000   -0.218218     0.218218
Approve_No   -0.047193      0.816497 -0.654654 -0.218218    1.000000    -1.000000
Approve_Yes   0.047193     -0.816497  0.654654  0.218218   -1.000000     1.000000

Or a subset:

>>> dummies[['Salary', 'Dept_HR']].corr()
           Salary   Dept_HR
Salary   1.000000 -0.175072
Dept_HR -0.175072  1.000000

Upvotes: 2

Related Questions