Reputation: 814
I have a data frame that looks like
> t
Institution Subject Class ML1 ML1SD
aPhysics0 A Physics 0 0.8730469 0.3329205
aPhysics1 A Physics 1 0.8471074 0.3598839
aPhysics2 A Physics 2 0.8593750 0.3476343
aPhysics3 A Physics 3 0.8875000 0.3159806
aPhysics4 A Physics 4 0.7962963 0.4027512
And I want to fit a linear function to ML1
against Class
, but when I call
> lm(ML1 ~ Class, data=t)
I get:
Call:
lm(formula = ML1 ~ Class, data = t)
Coefficients:
(Intercept) Class1 Class2 Class3 Class4
0.87305 -0.02594 -0.01367 0.01445 -0.07675
Which I don't really understand, because it looks like it is giving me multiple gradient values for each value of Class
, but there are 5 Class
values (0-4). But what I want is a single intercept and a single gradient value.
Also, when I call lm
with weights = 1/ML1SD^2
it does not change any of the values.
What am I doing wrong?
Upvotes: 1
Views: 122
Reputation: 118779
Class
is seen as a categorical variable
to lm
. I would suppose that your Class
is a factor.
And the result of the regression is correct in the sense that the estimates are corresponding to 4 classes. This is because, by default, the first level (0
, in your case) is taken as the reference level, and all the estimates you obtain are with respect to the reference. That is, mean(Class1) - mean(Class0)
is equal to -0.02594
.
If you don't want categorical variable on Class
and you want to model it as a continuous variable, then, you must convert factor
to numeric
(or integer
) type by doing: df$Class <- as.numeric(as.character(df$Class))
. Then, you get:
> lm(ML1 ~ Class, data=df)
Call:
lm(formula = ML1 ~ Class, data = df)
Coefficients:
(Intercept) Class
0.87529 -0.01131
But are you sure Class
is a continuous variable?
Edit: The weights
parameter does have an effect. It performs a weighted linear least squares regression. You can see that for the categorical variable
when you do summary(.)
:
summary( ML1 ~ factor(Class), data = df)
But in your case, the categorical variables have only one value.
Let me illustrate with another example:
set.seed(45)
# meaningless data
df <- data.frame(x=runif(10), y=rep(1:3, c(4,3,3)))
summary(lm(x ~ factor(y), data=df))
# without weights
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 0.39256 0.09481 4.141 0.00435 **
# factor(y)2 -0.09999 0.14482 -0.690 0.51216
# factor(y)3 -0.14433 0.14482 -0.997 0.35214
summary(lm(x ~ factor(y), data=df, weights=1/y))
# with weights
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 0.39256 0.07148 5.492 0.000914 ***
# factor(y)2 -0.09999 0.13687 -0.731 0.488798
# factor(y)3 -0.14433 0.15983 -0.903 0.396528
Upvotes: 3