user2543622
user2543622

Reputation: 6786

linear regression r comparing multiple observations vs single observation

Based upon answers of my question, I am supposed to get same values of intercept and the regression coefficient for below 2 models. But they are not the same. What is going on?

is something wrong with my code? Or is the original answer wrong?

#linear regression average qty per price point vs all quantities

x1=rnorm(30,20,1);y1=rep(3,30)
x2=rnorm(30,17,1.5);y2=rep(4,30)
x3=rnorm(30,12,2);y3=rep(4.5,30)
x4=rnorm(30,6,3);y4=rep(5.5,30)
x=c(x1,x2,x3,x4)
y=c(y1,y2,y3,y4)
plot(y,x)
cor(y,x)
fit=lm(x~y)
attributes(fit)
summary(fit)

xdum=c(20,17,12,6)
ydum=c(3,4,4.5,5.5)
plot(ydum,xdum)
cor(ydum,xdum)
fit1=lm(xdum~ydum)
attributes(fit1)
summary(fit1)


> summary(fit)

Call:
lm(formula = x ~ y)

Residuals:
    Min      1Q  Median      3Q     Max 
-8.3572 -1.6069 -0.1007  2.0222  6.4904 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  40.0952     1.1570   34.65   <2e-16 ***
y            -6.1932     0.2663  -23.25   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.63 on 118 degrees of freedom
Multiple R-squared:  0.8209,    Adjusted R-squared:  0.8194 
F-statistic: 540.8 on 1 and 118 DF,  p-value: < 2.2e-16

> summary(fit1)

Call:
lm(formula = xdum ~ ydum)

Residuals:
      1       2       3       4 
-0.9615  1.8077 -0.3077 -0.5385 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept)  38.2692     3.6456  10.497  0.00895 **
ydum         -5.7692     0.8391  -6.875  0.02051 * 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.513 on 2 degrees of freedom
Multiple R-squared:  0.9594,    Adjusted R-squared:  0.9391 
F-statistic: 47.27 on 1 and 2 DF,  p-value: 0.02051

Upvotes: 2

Views: 298

Answers (2)

thelatemail
thelatemail

Reputation: 93908

You are not calculating xdum and ydum in a comparable fashion because rnorm will only approximate the mean value you specify, particularly when you are sampling only 30 cases. This is easily fixed however:

coef(fit)
#(Intercept)           y 
#  39.618472   -6.128739 

xdum <- c(mean(x1),mean(x2),mean(x3),mean(x4))
ydum <- c(mean(y1),mean(y2),mean(y3),mean(y4))
coef(lm(xdum~ydum))
#(Intercept)        ydum 
#  39.618472   -6.128739 

Upvotes: 4

Hack-R
Hack-R

Reputation: 23210

In theory they should be the same if (and only if) the mean of the former model is equal to the point in the latter model.

This is not the case in your models, so the results are slightly different. For example the mean of x1:

x1=rnorm(30,20,1)
mean(x1)

20.08353

where the point version is 20.

There are similar tiny differences from your other rnorm samples:

> mean(x2)
[1] 17.0451
> mean(x3)
[1] 11.72307
> mean(x4)
[1] 5.913274

Not that this really matters, but just FYI the standard nomenclature is that Y is the dependent variable and X is the independent variable, which you reversed. Makes no difference of course, but just so you know.

Upvotes: 2

Related Questions