Reputation: 1
I'm studying the relationship between expenditure per student and performance on pisa (a standardized test), i know that this regression can't give me a ceteris paribus relationship but this is the point of my exercise, i have to explain why it will not work.
I was running the regression on R with the basic code:
lm1=lm(a~b)
but the problem is that R reports me 32 coefficient, which is the number of the components of my population, while i should only receive the slope and the intercept, given that is a simple regression
This is the output that R gives me:
Call:
lm(formula = a ~ b)
Coefficients:
(Intercept) b10167.3 b10467.8 b10766.4 b10863.4 b10960.1 b11.688.4 b11028.1 b11052 b11207.3 b11855.9 b12424.3 b13930.8
522.9936 5.9561 0.3401 -20.6884 -14.8603 -15.0777 -3.5752 -23.0459 -27.1021 -42.2692 -20.4485 -35.3906 -30.7468
b14353.3 b2.997.9 b20450.9 b3714.8 b4996.3 b5291.6 b5851.7 b6190.7 b6663.3 b6725.3 b6747.2 b7074.9 b8189.1
-18.4412 -107.2872 -39.6793 -98.2315 -80.2505 -36.2202 -48.6179 -64.2414 1.3887 -19.0389 -59.9734 -32.0751 -31.5962
b8406.2 b8533.5 b8671.1 b8996.3 b9265.7 b9897.2
-13.4219 -26.0155 -13.9045 -37.9996 -17.0271 -27.2954
As you can see there are 32 coefficient while i should receive only two, it seems that R is reading each unite of the population as a variable but the dataset is, as always, set with variable in row. I can't figure out what is the problem.
Upvotes: 0
Views: 2814
Reputation: 187
It's not a problem with the lm function. It appears that R is treating $b$ as a categorical variable. I have a made a small data with 5 observations, $a$ (numeric variable) and $b$ (categorical variable).
When I fit my model you will see that I am seeing a similar output as you (5 estimated coefficients).
data = data.frame(a = 1:5, b = as.factor(rnorm(5)))
lm(a~b, data)
Call:
lm(formula = a ~ b, data = data)
Coefficients:
(Intercept) b-0.16380292500502 b0.213340249988902 b0.423891299272316 b0.63738307939327
4 -3 -1 1 -2
To correct this you need to convert $b$ into a numerical vector.
data$b = as.numeric(as.character(data$b))
lm(a~b, data)
Call:
lm(formula = a ~ b, data = data)
Coefficients:
(Intercept) b
2.9580 0.2772
```
Upvotes: 3