Reputation: 1667
I am trying to predict a variable using a number of explanatory variables each of which has no visually detectable relationship, that is the scatterplots between each regressor and the predicted variable are completely flat clouds.
I took 2 approaches:
1) Running individual regressions, yields not significant relationship at all.
2) Once I play around with multiple combinations of multivariable regressions, I get significant relationships for some combinations (which are not robust though, that is, a variable is significant in one setting and looses signifcance in a different setting).
I am wondering, if based on 1), i.e. the fact that on an individual basis, there seems to be no relationship at all, I can conclude that a multivariable aprroach is destined to fail as well?
Upvotes: 2
Views: 566
Reputation: 10160
The answer is most definitely no, it is not guaranteed to fail. In fact you've already observed this to be the case in #2 where you get significant predictors in a multiple predictor setting
A regression between 1 predictor and 1 outcome amounts to the covariance or the correlation between the two variables. This is the relationship you observe in your scatterplots.
A regression where you have multiple predictors (multiple regression) has a rather different interpretation. Lets say you have a model like: Y = b0 + b1X1 + b2X2
b1
is interpreted as the relationship between X1
and Y
holding X2
constant. That is, you are controlling for the effect of X2
in this model. This is a very important feature of multiple regression.
To see this, run the following models:
Y = b0 + b1X1
Y = b0 + b1X1 + b2X2
You will see that the value of b1
in both cases are different. The degree of difference between the b1
values will depend on the magnitude of the covariance/correlation between X1
and X2
Just because a straight correlation between 2 variables is not significant does not mean that the relationship will remain non-significant once you control for the effect of other predictors
This point is highlighted by your example of robustness in #2. Why would a predictor be significant in some models, and non-significant when you use another subset of predictors? It is precisely because you are controlling for the effect of different variables in your various models.
Which other variables you choose to control for, and ultimately which specific regression model you choose to use, depends on what your goals are.
Upvotes: 1