Reputation: 7117
I am running linear models to look at the significance of independent factors involved. The example model is: `
mymod1 <- lm(temp ~ bgrp+psex+tb,data=mydat)
summary(mymod1)`
I look at the summary to check out the significance of each factor:
lm(formula = temp ~ bgrp + psex + tb, data = mydat)
Residuals:
Min 1Q Median 3Q Max
-5.6877 -0.2454 0.0768 0.3916 1.6561
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 37.324459 0.186081 200.581 < 2e-16 ***
bgrp 0.256794 0.066167 3.881 0.000115 ***
psex 0.144669 0.055140 2.624 0.008913 **
tb 0.019818 0.009342 2.121 0.034287 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.6888 on 621 degrees of freedom
(5 observations deleted due to missingness)
Multiple R-squared: 0.03675, Adjusted R-squared: 0.03209
F-statistic: 7.897 on 3 and 621 DF, p-value: 3.551e-05
Now, I would like to look at the solutions of the two levels of bgrp (1 and 2) and psex (1 and 2).
I would appreciate if you could help me with this.
Thanking you in advance,
Baz
EDIT:
I ran the first model you suggested and got the following:
mydat$bgrp <- as.factor(mydat$bgrp)
> summary(lm(temp ~ bgrp+psex+tb-1,data=mydat))
Call:
lm(formula = temp ~ bgrp + psex + tb - 1, data = apirt)
Residuals:
Min 1Q Median 3Q Max
-5.6877 -0.2454 0.0768 0.3916 1.6561
Coefficients:
Estimate Std. Error t value Pr(>|t|)
bgrp1 37.725922 0.135486 278.449 < 2e-16 ***
bgrp2 37.982716 0.129558 293.171 < 2e-16 ***
psex2 0.144669 0.055140 2.624 0.00891 **
tb 0.019818 0.009342 2.121 0.03429 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.6888 on 621 degrees of freedom
(5 observations deleted due to missingness)
Multiple R-squared: 0.9997, Adjusted R-squared: 0.9997
F-statistic: 4.788e+05 on 4 and 621 DF, p-value: < 2.2e-16
From the above coefficient table, bgrp1 and bgrp2 seem to make sense: bgrp1 represents the maternal lines with larger litter sizes, lighter offsprings, which results in lower rectal temperature (37.70 degrees C) of the offspring. On the other hand, bgrp2 represents the terminal lines with smaller litter size, heavier offsprings, which results in higher a rectal temperature (37.98 degrees C). I am just wondering, if the same could be done for psex1 and psex2, but what is presented in the table of coefficients could be due to what you said earlier.
EDIT: Hi Mark,
I tried the two options you suggested and I could see that bgrp1 and psex1 are taking on the same values:
> mybgrp <- lm(formula = temp ~ bgrp+psex+tb-1, data = mydat)
> mybgrp
Call:
lm(formula = temp ~ bgrp + psex + tb - 1, data = mydat)
Coefficients:
bgrp1 bgrp2 psex2 tb
37.72592 37.98272 0.14467 0.01982
> summary(mybgrp)
Call:
lm(formula = temp ~ bgrp + psex + tb - 1, data = mydat)
Residuals:
Min 1Q Median 3Q Max
-5.6877 -0.2454 0.0768 0.3916 1.6561
Coefficients:
Estimate Std. Error t value Pr(>|t|)
bgrp1 37.725922 0.135486 278.449 < 2e-16 ***
bgrp2 37.982716 0.129558 293.171 < 2e-16 ***
psex2 0.144669 0.055140 2.624 0.00891 **
tb 0.019818 0.009342 2.121 0.03429 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.6888 on 621 degrees of freedom
(5 observations deleted due to missingness)
Multiple R-squared: 0.9997, Adjusted R-squared: 0.9997
F-statistic: 4.788e+05 on 4 and 621 DF, p-value: < 2.2e-16
> mypsex <- lm(formula = temp ~ psex+bgrp+tb-1, data = mydat)
> mypsex
Call:
lm(formula = temp ~ psex + bgrp + tb - 1, data = mydat)
Coefficients:
psex1 psex2 bgrp2 tb
37.72592 37.87059 0.25679 0.01982
> summary(mypsex)
Call:
lm(formula = temp ~ psex + bgrp + tb - 1, data = mydat)
Residuals:
Min 1Q Median 3Q Max
-5.6877 -0.2454 0.0768 0.3916 1.6561
Coefficients:
Estimate Std. Error t value Pr(>|t|)
psex1 37.725922 0.135486 278.449 < 2e-16 ***
psex2 37.870591 0.135908 278.649 < 2e-16 ***
bgrp2 0.256794 0.066167 3.881 0.000115 ***
tb 0.019818 0.009342 2.121 0.034287 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.6888 on 621 degrees of freedom
(5 observations deleted due to missingness)
Multiple R-squared: 0.9997, Adjusted R-squared: 0.9997
F-statistic: 4.788e+05 on 4 and 621 DF, p-value: < 2.2e-16
Thanks!
Upvotes: 1
Views: 3990
Reputation: 13113
If there are only two levels to a variable (1 vs 2), that is the same as (0 vs 1) and the slope is for one of those 2 levels. The other level of the variable is included in the intercept term.
Maybe try
lm(formula = temp ~ bgrp + psex + tb - 1 , data = mydat)
to remove the intercept and see if that gives you what you want.
Then again, maybe I do not understand your question correctly.
Edit:
When I use fake data and set
bgrp <- as.factor(bgrp)
psex <- as.factor(psex)
without an intercept I get 2 slopes for one of the 2 factors. I believe R is holding the second slope for the second factor = 0.
Edit2:
This model will provide separate slopes for each combination of bgrp and psex. The model includes an interaction between bgrp and psex and then removes the intercept and the bgrp and psex main effects:
mymod2 <- lm(temp ~ bgrp + psex + bgrp * psex + tb - 1 - bgrp - psex)
Edit3:
If you are used to using SAS and try running the same analysis in SAS and R you might find that the two programs initially do not return the same estimates. That may be because SAS and R select different factor levels for the intercept by default. You can alter the default factor level for the intercept in R to match that used by SAS and then you might find that both programs give the same answer.
Compare the following R code with the SAS output from here:
http://support.sas.com/kb/38/384.html
where the SAS code makes use of the option 'solution':
my.data <- matrix(c(
'A', 'F', 9, 25,
'A', 'F', 3, 19,
'A', 'F', 4, 18,
'A', 'F', 11, 28,
'A', 'F', 7, 23,
'A', 'M', 11, 27,
'A', 'M', 9, 24,
'A', 'M', 9, 25,
'A', 'M', 10, 28,
'A', 'M', 10, 26,
'D', 'F', 4, 37,
'D', 'F', 12, 54,
'D', 'F', 3, 33,
'D', 'F', 6, 41,
'D', 'F', 9, 47,
'D', 'M', 5, 36,
'D', 'M', 4, 36,
'D', 'M', 7, 40,
'D', 'M', 10, 46,
'D', 'M', 8, 42,
'G', 'F', 10, 70,
'G', 'F', 11, 75,
'G', 'F', 7, 60,
'G', 'F', 9, 69,
'G', 'F', 10, 71,
'G', 'M', 3, 47,
'G', 'M', 8, 60,
'G', 'M', 11, 70,
'G', 'M', 4, 49,
'G', 'M', 4, 50
), nrow = 30, byrow=T,
dimnames = list(NULL, c("drug","gender","x","y")));
my.data <- as.data.frame(my.data, stringsAsFactors=F)
my.data
my.data$y <- as.numeric(my.data$y)
my.data$x <- as.numeric(my.data$x)
my.data$drug <- as.factor(my.data$drug)
my.data$gender <- as.factor(my.data$gender)
str(my.data)
my.data$drug <- relevel(my.data$drug, ref="G")
my.data$gender <- relevel(my.data$gender, ref="M")
my.mod1 <- lm(my.data$y ~ my.data$drug)
my.mod1
summary(my.mod1)
my.mod2 <- lm(my.data$y ~ my.data$drug-1)
my.mod2
summary(my.mod2)
my.mod3 <- lm(my.data$y ~ my.data$drug + my.data$gender +
my.data$drug * my.data$gender )
my.mod3
summary(my.mod3)
my.mod4 <- lm(my.data$y ~ my.data$drug + my.data$gender +
my.data$drug * my.data$gender - 1 )
my.mod4
summary(my.mod4)
my.mod5 <- lm(my.data$y ~ my.data$drug + my.data$x +
my.data$drug * my.data$x )
my.mod5
summary(my.mod5)
my.mod6 <- lm(my.data$y ~ my.data$drug + my.data$x +
my.data$drug * my.data$x - 1 )
my.mod6
summary(my.mod6)
Upvotes: 1