Reputation: 143
I would like to compare regression models collected in three locations to see whether location makes a difference.
Consider some toy data:
mydata <- read.table(header=TRUE, text="
location height weight
spain 178 90
spain 187 80
spain 155 70
spain 187 85
spain 150 60
spain 155 73
spain 168 80
spain 160 75
spain 177 77
spain 178 83
russia 165 60
russia 161 55
russia 187 94
russia 175 77
russia 170 70
russia 181 90
russia 173 72
russia 163 58
russia 177 80
russia 167 67
peru 177 75
peru 182 65
peru 145 55
peru 176 70
peru 150 45
peru 155 58
peru 168 65
peru 160 60
peru 177 62
peru 178 68
")
I know I can use ANOVAs etc to see if there is a treatment (i.e. country) difference in height or weight but I am not sure if I can do this explicitly for the REGRESSIONS (i.e. is there a difference in the relationship between height and weight in different countries). For this example, I would like to assume that weight is a function of height.
If you produce a regression for each country, you'll see that spain and peru have a similar slope but a different intercept, while russia has a much steeper slope and intercept. How can I formally test this (ideally with significance values etc)?
Thank you in advance - and apologies for my obviously inadequate statistical background.
Upvotes: 1
Views: 1380
Reputation: 850
I just want to add a bit to the a previous answer "lm(weight~height*location, mydata)". When it comes to regressions I think of ANOVA's. so if you add ANOVA in the previous answer then:
anova(lm(weight~height*location, mydata))
Analysis of Variance Table
Response: weight
Df Sum Sq Mean Sq F value Pr(>F)
height 1 2103.80 2103.80 109.738 1.969e-10 ***
location 2 859.82 429.91 22.425 3.219e-06 ***
height:location 2 585.24 292.62 15.264 5.286e-05 ***
Residuals 24 460.11 19.17
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
So Pr(>F) shows you when differences are significant here we have *** across the board.
Before you run ANOVAs you have to check out the assumptions of equality of variance and normality
Upvotes: 2
Reputation: 9687
Typically I would include an interaction term in the model:
lm(weight~height*location, mydata)
If the interaction coefficient is large, there's evidence that the slope may be different at different locations. You can still use ANOVA for an overall test.
Upvotes: 3
Reputation: 789
I think the Chow Test may be appropriate?
It looks like it's implemented in the gap package.
library('gap')
mydata <- read.table(header=TRUE, text="
location height weight
spain 178 90
spain 187 80
spain 155 70
spain 187 85
spain 150 60
spain 155 73
spain 168 80
spain 160 75
spain 177 77
spain 178 83
russia 165 60
russia 161 55
russia 187 94
russia 175 77
russia 170 70
russia 181 90
russia 173 72
russia 163 58
russia 177 80
russia 167 67
peru 177 75
peru 182 65
peru 145 55
peru 176 70
peru 150 45
peru 155 58
peru 168 65
peru 160 60
peru 177 62
peru 178 68
")
y1 <- subset(mydata, location=='spain')$weight
y2 <- subset(mydata, location=='peru')$weight
x1 <- subset(mydata, location=='spain')$height
x2 <- subset(mydata, location=='peru')$height
chow.test(y1, x1, y2, x2)
F value d.f.1 d.f.2 P value
1.701350e+01 2.000000e+00 1.600000e+01 1.094774e-04
And the small P value would seem to suggest that spain and peru in fact are best serviced by different models...
Upvotes: 2