bountiful
bountiful

Reputation: 814

Fitting a linear model

I have a data frame that looks like

> t
          Institution Subject Class       ML1     ML1SD
aPhysics0           A Physics     0 0.8730469 0.3329205
aPhysics1           A Physics     1 0.8471074 0.3598839
aPhysics2           A Physics     2 0.8593750 0.3476343
aPhysics3           A Physics     3 0.8875000 0.3159806
aPhysics4           A Physics     4 0.7962963 0.4027512

And I want to fit a linear function to ML1 against Class, but when I call

> lm(ML1 ~ Class, data=t)

I get:

Call:
lm(formula = ML1 ~ Class, data = t)

Coefficients:
(Intercept)       Class1       Class2       Class3       Class4  
    0.87305     -0.02594     -0.01367      0.01445     -0.07675  

Which I don't really understand, because it looks like it is giving me multiple gradient values for each value of Class, but there are 5 Class values (0-4). But what I want is a single intercept and a single gradient value.

Also, when I call lm with weights = 1/ML1SD^2 it does not change any of the values.

What am I doing wrong?

Upvotes: 1

Views: 122

Answers (1)

Arun
Arun

Reputation: 118779

Class is seen as a categorical variable to lm. I would suppose that your Class is a factor.

And the result of the regression is correct in the sense that the estimates are corresponding to 4 classes. This is because, by default, the first level (0, in your case) is taken as the reference level, and all the estimates you obtain are with respect to the reference. That is, mean(Class1) - mean(Class0) is equal to -0.02594.

If you don't want categorical variable on Class and you want to model it as a continuous variable, then, you must convert factor to numeric (or integer) type by doing: df$Class <- as.numeric(as.character(df$Class)). Then, you get:

> lm(ML1 ~ Class, data=df)

Call:
lm(formula = ML1 ~ Class, data = df)

Coefficients:
(Intercept)        Class  
    0.87529     -0.01131  

But are you sure Class is a continuous variable?

Edit: The weights parameter does have an effect. It performs a weighted linear least squares regression. You can see that for the categorical variable when you do summary(.):

summary( ML1 ~ factor(Class), data = df)

But in your case, the categorical variables have only one value.

Let me illustrate with another example:

set.seed(45)
# meaningless data
df <- data.frame(x=runif(10), y=rep(1:3, c(4,3,3)))

summary(lm(x ~ factor(y), data=df))
# without weights 
# Coefficients:
#             Estimate Std. Error t value Pr(>|t|)   
# (Intercept)  0.39256    0.09481   4.141  0.00435 **
# factor(y)2  -0.09999    0.14482  -0.690  0.51216   
# factor(y)3  -0.14433    0.14482  -0.997  0.35214   


summary(lm(x ~ factor(y), data=df, weights=1/y))
# with weights
# Coefficients:
#             Estimate Std. Error t value Pr(>|t|)    
# (Intercept)  0.39256    0.07148   5.492 0.000914 ***
# factor(y)2  -0.09999    0.13687  -0.731 0.488798    
# factor(y)3  -0.14433    0.15983  -0.903 0.396528    

Upvotes: 3

Related Questions