Reputation: 395
I've read some tutorial about the lm() function in R and I am a little bit confuse about how this function deal with continuous or discrete predictors. In https://www.r-bloggers.com/r-tutorial-series-simple-linear-regression/, for continuous labels, the coefficients represent the intercept and the slope of the linear regression.
This is clear, but if now I have a category of gender, where values are 0 or 1, how does the lm() function work. Does the function apply a logistic regression or is it still possible to use the function in this way.
Upvotes: 1
Views: 2008
Reputation: 24069
Your the answer you are looking for is unclear from your question. Yes, you can use the lm
function with a categorical variables. The resultant equation is the sum of two linear fits.
It is best to illustrate with an example. Using made up data:
x <- seq(1:10)
y1<- x+rnorm(10, 0, 0.1)
y2<- 14-x+rnorm(10, 0, 0.1)
f<-rep(c("A", "B"), each=10)
df<-data.frame(x=c(x,x), y=c(y1, y2), f)
#Model 1
print(lm(y1~x))
# lm(formula = y1 ~ x)
#
# Coefficients:
# (Intercept) x
# 0.1703 0.9754
#Model 2
model<-lm(y~x*f, data=df)
print(model)
# lm(formula = y ~ x * f, data = df)
#
# Coefficients:
#(Intercept) x fB x:fB
# 0.1703 0.9754 13.7622 -1.9709
#Model 3
print(lm(y2~x))
# lm(formula = y2 ~ x)
#
# Coefficients:
# (Intercept) x
# 13.9325 -0.9955
After running the code above and comparing the Model 1 and 2, you can see how the intercept and the x slope are the same. This is because the when it is factor A (i.e. 0 or absence), fb and x:fb are 0 and drops out. When the factor is B then fb and x:fb are actual values and are additive to the model.
If you add the intercept and fb together and add the x slope to x:fb the results will be the slope and intercept of model 3.
I hope this helps and did not cloud your understanding.
Upvotes: 2