Charge
Charge

Reputation: 1

Too many coefficients with lm

I'm studying the relationship between expenditure per student and performance on pisa (a standardized test), i know that this regression can't give me a ceteris paribus relationship but this is the point of my exercise, i have to explain why it will not work.

I was running the regression on R with the basic code:

lm1=lm(a~b)

but the problem is that R reports me 32 coefficient, which is the number of the components of my population, while i should only receive the slope and the intercept, given that is a simple regression

This is the output that R gives me:

Call:
lm(formula = a ~ b)

Coefficients:
(Intercept)     b10167.3     b10467.8     b10766.4     b10863.4     b10960.1    b11.688.4     b11028.1       b11052     b11207.3     b11855.9     b12424.3     b13930.8  
   522.9936       5.9561       0.3401     -20.6884     -14.8603     -15.0777      -3.5752     -23.0459     -27.1021     -42.2692     -20.4485     -35.3906     -30.7468  
   b14353.3     b2.997.9     b20450.9      b3714.8      b4996.3      b5291.6      b5851.7      b6190.7      b6663.3      b6725.3      b6747.2      b7074.9      b8189.1  
   -18.4412    -107.2872     -39.6793     -98.2315     -80.2505     -36.2202     -48.6179     -64.2414       1.3887     -19.0389     -59.9734     -32.0751     -31.5962  
    b8406.2      b8533.5      b8671.1      b8996.3      b9265.7      b9897.2  
   -13.4219     -26.0155     -13.9045     -37.9996     -17.0271     -27.2954 

As you can see there are 32 coefficient while i should receive only two, it seems that R is reading each unite of the population as a variable but the dataset is, as always, set with variable in row. I can't figure out what is the problem.

Upvotes: 0

Views: 2814

Answers (1)

nwaldo
nwaldo

Reputation: 187

It's not a problem with the lm function. It appears that R is treating $b$ as a categorical variable. I have a made a small data with 5 observations, $a$ (numeric variable) and $b$ (categorical variable).

When I fit my model you will see that I am seeing a similar output as you (5 estimated coefficients).

data = data.frame(a = 1:5, b = as.factor(rnorm(5)))
lm(a~b, data)
Call:
lm(formula = a ~ b, data = data)

Coefficients:
       (Intercept)  b-0.16380292500502  b0.213340249988902  b0.423891299272316   b0.63738307939327  
                 4                  -3                  -1                   1                  -2  

To correct this you need to convert $b$ into a numerical vector.

data$b = as.numeric(as.character(data$b))
lm(a~b, data)
Call:
lm(formula = a ~ b, data = data)

Coefficients:
(Intercept)            b  
     2.9580       0.2772 
``` 

Upvotes: 3

Related Questions