Correlation with categorical dependent variables

Question

my data have approximately this scheme:

Category	Value1	Value2	Value3
A	5.8	7.2	8.8
A	5.7	6.7	4.5
B	8.5	7.3	2.2
C	5.3	0.4	4.1
C	4.2	9.5	9.3
C	5.9	7.6	5.3
D	7.6	3.5	2.3
D	6.8	8.8	6.4

So my aim is to calculate the correlations. Whether the Values 1-3 are affected differently depending on the category. E.g. if we can say that Category A leads to a higher Value 1 than the other categories. What is the best and shortest way to achieve this in Python?

sophocles · Accepted Answer

I am not fully confident in how you want to approach this. But given your question, you can check the difference in Value columns for each categories in a 'short' way using a grouped mean:

df.groupby('Category').mean()

            Value1    Value2    Value3
Category                              
A         5.750000  6.950000  6.650000
B         8.500000  7.300000  2.200000
C         5.133333  5.833333  6.233333
D         7.200000  6.150000  4.350000

This shows you that contrary to your expectations Category A leads to a lower value in Value 1 than the the rest.

You can also calculate the percentage change for each category, moving from each Value to the next:

df.groupby('Category').mean().pct_change(axis=1).fillna(0)

          Value1    Value2    Value3
Category                            
A            0.0  0.208696 -0.043165
B            0.0 -0.141176 -0.698630
C            0.0  0.136364  0.068571
D            0.0 -0.145833 -0.292683

To get the p-values, you can use a very simple linear regression. There are many sources online that will help you here. However, at it's simplest terms:

from statsmodels.formula.api import ols
fit = ols('Value1 ~ C(Category)', data=df).fit() 
#fit.summary() 

>>> fit.pvalues.reset_index().rename({0:'p_values'},axis=1)

              index  p_values
0         Intercept  0.000269
1  C(Category)[T.B]  0.028933
2  C(Category)[T.C]  0.372288
3  C(Category)[T.D]  0.097482

Correlation with categorical dependent variables

Answers (1)

Related Questions

Category	Value1	Value2	Value3
A	5.8	7.2	8.8
A	5.7	6.7	4.5
B	8.5	7.3	2.2
C	5.3	0.4	4.1
C	4.2	9.5	9.3
C	5.9	7.6	5.3
D	7.6	3.5	2.3
D	6.8	8.8	6.4

Category	Value1	Value2	Value3
A	5.8	7.2	8.8
A	5.7	6.7	4.5
B	8.5	7.3	2.2
C	5.3	0.4	4.1
C	4.2	9.5	9.3
C	5.9	7.6	5.3
D	7.6	3.5	2.3
D	6.8	8.8	6.4

Category	Value1	Value2	Value3
A	5.8	7.2	8.8
A	5.7	6.7	4.5
B	8.5	7.3	2.2
C	5.3	0.4	4.1
C	4.2	9.5	9.3
C	5.9	7.6	5.3
D	7.6	3.5	2.3
D	6.8	8.8	6.4