Aiden Blake
Aiden Blake

Reputation: 117

Correlation with categorical dependent variables

my data have approximately this scheme:

Category Value1 Value2 Value3
A 5.8 7.2 8.8
A 5.7 6.7 4.5
B 8.5 7.3 2.2
C 5.3 0.4 4.1
C 4.2 9.5 9.3
C 5.9 7.6 5.3
D 7.6 3.5 2.3
D 6.8 8.8 6.4

So my aim is to calculate the correlations. Whether the Values 1-3 are affected differently depending on the category. E.g. if we can say that Category A leads to a higher Value 1 than the other categories. What is the best and shortest way to achieve this in Python?

Upvotes: 0

Views: 209

Answers (1)

sophocles
sophocles

Reputation: 13821

I am not fully confident in how you want to approach this. But given your question, you can check the difference in Value columns for each categories in a 'short' way using a grouped mean:

df.groupby('Category').mean()

            Value1    Value2    Value3
Category                              
A         5.750000  6.950000  6.650000
B         8.500000  7.300000  2.200000
C         5.133333  5.833333  6.233333
D         7.200000  6.150000  4.350000

This shows you that contrary to your expectations Category A leads to a lower value in Value 1 than the the rest.

You can also calculate the percentage change for each category, moving from each Value to the next:

df.groupby('Category').mean().pct_change(axis=1).fillna(0)

          Value1    Value2    Value3
Category                            
A            0.0  0.208696 -0.043165
B            0.0 -0.141176 -0.698630
C            0.0  0.136364  0.068571
D            0.0 -0.145833 -0.292683

To get the p-values, you can use a very simple linear regression. There are many sources online that will help you here. However, at it's simplest terms:

from statsmodels.formula.api import ols
fit = ols('Value1 ~ C(Category)', data=df).fit() 
#fit.summary() 

>>> fit.pvalues.reset_index().rename({0:'p_values'},axis=1)

              index  p_values
0         Intercept  0.000269
1  C(Category)[T.B]  0.028933
2  C(Category)[T.C]  0.372288
3  C(Category)[T.D]  0.097482

Upvotes: 2

Related Questions