Reputation: 2127
I'm using the statsmodels
library to check for the impact of confounding variables on a dependent variable by performing multivariate linear regression:
model = ols(f'{metric}_diff ~ {" + ".join(confounding_variable_names)}', data=df).fit()
This is how my data looks like (pasted only 2 rows):
Age Sex Experience using a gamepad (1-4) Experience using a VR headset (1-4) Experience using hand tracking (1-3) Experience using controllers in VR (1-3) Glasses ID_1 ID_2 Method_1 Method_2 ID_controller ID_handTracking CorrectGestureCounter_controller CorrectGestureCounter_handTracking IncorrectGestureCounter_controller IncorrectGestureCounter_handTracking
IDs
ID_K_1_3 25 Female 4 3 1 2 Yes K_1 K_3 controller handTracking K_1 K_3 21 34 5 2
ID_K_4_5 19 Male 4 2 1 2 Yes K_4 K_5 controller handTracking K_4 K_5 21 36 14 17
When I execute model.summary()
I get output like this:
OLS Regression Results
======================================================================================
Dep. Variable: CorrectGestureCounter_diff R-squared: 0.477
Model: OLS Adj. R-squared: 0.249
Method: Least Squares F-statistic: 2.088
Date: Wed, 28 Dec 2022 Prob (F-statistic): 0.105
Time: 15:29:41 Log-Likelihood: -73.565
No. Observations: 24 AIC: 163.1
Df Residuals: 16 BIC: 172.6
Df Model: 7
Covariance Type: nonrobust
==========================================================================================================
coef std err t P>|t| [0.025 0.975]
----------------------------------------------------------------------------------------------------------
Intercept -24.6404 9.326 -2.642 0.018 -44.410 -4.871
Sex[T.Male] -7.3225 3.170 -2.310 0.035 -14.043 -0.602
Glasses[T.Yes] -2.4210 2.995 -0.808 0.431 -8.771 3.929
Age 0.2957 0.183 1.613 0.126 -0.093 0.684
Experience_using_a_gamepad_1_4 1.8810 1.853 1.015 0.325 -2.047 5.809
Experience_using_a_VR_headset_1_4 0.9559 3.213 0.297 0.770 -5.856 7.768
Experience_using_hand_tracking_1_3 -2.4689 3.633 -0.680 0.506 -10.170 5.232
Experience_using_controllers_in_VR_1_3 2.3592 4.840 0.487 0.633 -7.902 12.620
==============================================================================
Omnibus: 0.621 Durbin-Watson: 2.566
Prob(Omnibus): 0.733 Jarque-Bera (JB): 0.702
Skew: -0.277 Prob(JB): 0.704
Kurtosis: 2.371 Cond. No. 205.
==============================================================================
What do the [T.Male]
or [T.Yes]
next to Sex
and Glasses
mean? How should I interpret this? Also why is Intercept
added next to my variables? Should I care about it in the context of confounding variables?
Upvotes: 3
Views: 6116
Reputation: 66
The rsenne answer is very complete, i would just like to add that given all that rsenne says about the intercept, means that the intercept is basically the expected mean value of your model. And i recomend the reading of below article in medium that explain especifically how to interpret the output of this summary.
https://medium.com/swlh/interpreting-linear-regression-through-statsmodels-summary-4796d359035a
Upvotes: 1
Reputation: 367
This is more of a stats question but I'll do my best to help. A multivariate regression is of the form:
Where, Y, B, and, U are vectors associated with the dependent variable, coefficients, and error terms respectively. X then is the design matrix that houses all of your predictor variables. Such as Age, Glasses, etc. Onto your question of the intercept, the above equation can be written as:
Thus from this, we can determine that "beta naught" is an intercept that does not depend on any of your predictor variables that is to say that just like in y=mx+b basic slope formula-speak, that beta naught term is the intercept that your regression is showing. Meaning that if all other terms are zero, your response variable would start at -24.6404. This is sort of the base value of your regression, meaning this term is added to each and every prediction.
As for the other variables i.e. glasses and sex..you basically have what is called a "dummy variable" that is to say:
Where I(t) is an indicator function indicating if that's true or false, so your x vectors corresponding to Age and Sex are binary vectors. Thus in your example, Male (T.male) is encoded as a 1, and having glasses (T.Yes) is also a 1. Thus female and no glasses is a zero. Thus the interpretation is, if you are a male and wear glasses, add -7.3225 and -2.4210 respectively, else add nothing (because anything times zero is zero).
Hope that helped! I can't say much about your specific use case because I don't know exactly what statistical questions you have but this is at least a quick crash course in understanding the output of your regression.
Upvotes: 5