Reputation: 8014
I have Python dataframe df
It has multiple columns
Salary Dept Approve
1500 IT Yes
1200 Finance No
1200 IT No
1300 HR Yes
1800 Finance No
1100 Finance No
1600 Finance No
1500 IT Yes
1200 HR Yes
1500 HR Yes
I want to find the relation between Salary/Approve and Dept/Approve
Correlation is not working as some are classification not numerical
What other options do I have? How can I find the correlation between Salary/Approve and Dept/Approve
Upvotes: 1
Views: 376
Reputation: 7170
One way you can do this is by converting the categorical variables to dummies, and then computing correlations against each of them:
dummies = pd.get_dummies(x)
From there it's easy to compute correlations between whatever combinations you like:
>>> dummies.corr()
Salary Dept_Finance Dept_HR Dept_IT Approve_No Approve_Yes
Salary 1.000000 0.134865 -0.175072 0.030895 -0.047193 0.047193
Dept_Finance 0.134865 1.000000 -0.534522 -0.534522 0.816497 -0.816497
Dept_HR -0.175072 -0.534522 1.000000 -0.428571 -0.654654 0.654654
Dept_IT 0.030895 -0.534522 -0.428571 1.000000 -0.218218 0.218218
Approve_No -0.047193 0.816497 -0.654654 -0.218218 1.000000 -1.000000
Approve_Yes 0.047193 -0.816497 0.654654 0.218218 -1.000000 1.000000
Or a subset:
>>> dummies[['Salary', 'Dept_HR']].corr()
Salary Dept_HR
Salary 1.000000 -0.175072
Dept_HR -0.175072 1.000000
Upvotes: 2