setbackademic
setbackademic

Reputation: 143

statistically comparing linear regressions in R

I would like to compare regression models collected in three locations to see whether location makes a difference.

Consider some toy data:

mydata <- read.table(header=TRUE, text="
location    height  weight
    spain   178 90
    spain   187 80
    spain   155 70
    spain   187 85
    spain   150 60
    spain   155 73
    spain   168 80
    spain   160 75
    spain   177 77
    spain   178 83
    russia  165 60
    russia  161 55
    russia  187 94
    russia  175 77
    russia  170 70
    russia  181 90
    russia  173 72
    russia  163 58
    russia  177 80
    russia  167 67
    peru    177 75
    peru    182 65
    peru    145 55
    peru    176 70
    peru    150 45
    peru    155 58
    peru    168 65
    peru    160 60
    peru    177 62
    peru    178 68
        ")

I know I can use ANOVAs etc to see if there is a treatment (i.e. country) difference in height or weight but I am not sure if I can do this explicitly for the REGRESSIONS (i.e. is there a difference in the relationship between height and weight in different countries). For this example, I would like to assume that weight is a function of height.

If you produce a regression for each country, you'll see that spain and peru have a similar slope but a different intercept, while russia has a much steeper slope and intercept. How can I formally test this (ideally with significance values etc)?

Thank you in advance - and apologies for my obviously inadequate statistical background.

Upvotes: 1

Views: 1380

Answers (3)

Dimitrios Zacharatos
Dimitrios Zacharatos

Reputation: 850

I just want to add a bit to the a previous answer "lm(weight~height*location, mydata)". When it comes to regressions I think of ANOVA's. so if you add ANOVA in the previous answer then:

anova(lm(weight~height*location, mydata))

Analysis of Variance Table

Response: weight
                Df  Sum Sq Mean Sq F value    Pr(>F)    
height           1 2103.80 2103.80 109.738 1.969e-10 ***
location         2  859.82  429.91  22.425 3.219e-06 ***
height:location  2  585.24  292.62  15.264 5.286e-05 ***
Residuals       24  460.11   19.17                      
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

So Pr(>F) shows you when differences are significant here we have *** across the board.

Before you run ANOVAs you have to check out the assumptions of equality of variance and normality

Upvotes: 2

Neal Fultz
Neal Fultz

Reputation: 9687

Typically I would include an interaction term in the model:

lm(weight~height*location, mydata)

If the interaction coefficient is large, there's evidence that the slope may be different at different locations. You can still use ANOVA for an overall test.

Upvotes: 3

HarlandMason
HarlandMason

Reputation: 789

I think the Chow Test may be appropriate?

It looks like it's implemented in the gap package.

library('gap')
mydata <- read.table(header=TRUE, text="
   location    height  weight
   spain   178 90
   spain   187 80
   spain   155 70
   spain   187 85
   spain   150 60
   spain   155 73
   spain   168 80
   spain   160 75
   spain   177 77
   spain   178 83
   russia  165 60
   russia  161 55
   russia  187 94
   russia  175 77
   russia  170 70
   russia  181 90
   russia  173 72
   russia  163 58
   russia  177 80
   russia  167 67
   peru    177 75
   peru    182 65
   peru    145 55
   peru    176 70
   peru    150 45
   peru    155 58
   peru    168 65
   peru    160 60
   peru    177 62
   peru    178 68
   ")

y1 <- subset(mydata, location=='spain')$weight
y2 <- subset(mydata, location=='peru')$weight
x1 <- subset(mydata, location=='spain')$height
x2 <- subset(mydata, location=='peru')$height

chow.test(y1, x1, y2, x2)

     F value        d.f.1        d.f.2      P value 
1.701350e+01 2.000000e+00 1.600000e+01 1.094774e-04 

And the small P value would seem to suggest that spain and peru in fact are best serviced by different models...

Upvotes: 2

Related Questions