Reputation: 1979
I found similar entries but not exactly what I want. For two categorized variable (e.g., gender(1,2)), I need to create a dummy variable, 0s being male and 1s being female.
Here how my data look like and what I did.
data <- as.data.frame(as.matrix(c(1,2,2,1,2,1,1,2),8,1))
V1
1 1
2 2
3 2
4 1
5 2
6 1
7 1
8 2
library(dummies)
data <- cbind(data, dummy(data$V1, sep = "_"))
> data
V1 data_1 data_2
1 1 1 0
2 2 0 1
3 2 0 1
4 1 1 0
5 2 0 1
6 1 1 0
7 1 1 0
8 2 0 1
In this code, the second category is also (0,1). Also, is there a way to determine which to determine the baseline (assigning 0 to any category)?
I want it to look like this:
> data
V1 V1_dummy
1 1 0
2 2 1
3 2 1
4 1 0
5 2 1
6 1 0
7 1 0
8 2 1
Also, I want to extend this to three category variables, having two categories after recoding (n-1).
Thanks in advance!
Upvotes: 2
Views: 390
Reputation: 48191
You can use model.matrix
in the following way. Some sample data with a three level factor:
set.seed(1)
(df <- data.frame(x = factor(rbinom(5, 2, 0.4))))
# x
# 1 0
# 2 1
# 3 1
# 4 2
# 5 0
Then
model.matrix(~ x, df)[, -1]
# x1 x2
# 1 0 0
# 2 1 0
# 3 1 0
# 4 0 1
# 5 0 0
If you want to specify which group disappears, we need to rearrange the factor levels. It is the first group that disappears. So, e.g.,
levels(df$x) <- c("1", "0", "2")
model.matrix(~x, df)[, -1]
# x0 x2
# 1 0 0
# 2 1 0
# 3 1 0
# 4 0 1
# 5 0 0
Upvotes: 1