Reputation: 828
I'm trying to understand some code that builds a model matrix in R but having trouble understanding some basic syntax.
Here's some reproducible code below:
test_df <- data.frame(category =c("Poetry", "Narrative Film", "Music"),
country=c("GB", "US", "US"), usd_goal_real=c(1534,30000,45000),
time_int = c(59, 60, 45), state=c(0,0,0)
)
test_df2 <- data.frame(model.matrix( ~ . -1, test_df))
test_df3 <- data.frame(model.matrix( ~ . , test_df))
What exactly is specified in the line test_df2 <- data.frame(model.matrix( ~ . -1, test_df))
?
Specifically, what does the ~ . -1
mean? Is this excluding a field from the model? How does iI differ from the formula ~ . ,
in the next line?
Upvotes: 0
Views: 85
Reputation: 413
The simplest answer is that the -1
in the formula in model.matrix
removes the X intercept term from the model.
data.frame(model.matrix( ~ . -1, test_df))
produces:
categoryMusic categoryNarrative.Film categoryPoetry countryUS usd_goal_real time_int state
1 0 0 1 0 1534 59 0
2 0 1 0 1 30000 60 0
3 1 0 0 1 45000 45 0
and data.frame(model.matrix( ~ . , test_df))
produces:
X.Intercept. categoryNarrative.Film categoryPoetry countryUS usd_goal_real time_int state
1 1 0 1 0 1534 59 0
2 1 1 0 1 30000 60 0
3 1 0 0 1 45000 45 0
since there is a categorical variable in the model, you will also notice that the Music
level of that variable disappears when there is an X intercept in the model since the first level of the variable is used for the intercept and all others are measured from that.
These are 2 different ways of parameterizing your model
Upvotes: 1