Reputation: 747
I'm working on a problem where I have 100+ items, and the result of the problem contains those 100+ items organized in groups of 2 n
times. I want to create an n x 100 dummy matrix for the result dataset which indicates whether or not the item was in the specific "run". I can usually easily do this with model.matrix
, however my results sometimes don't have every item in the dataframe, and I want these to be all 0's. Example:
library(dplyr)
AllIDs <- c('A', 'B', 'C', 'D', 'E', 'G', 'H')
resultID <- c('D', 'A', 'C', 'G', 'A', 'H')
resultRun <- (rep(1:3, each = 2))
resultDF <- data.frame(resultRun, resultID, stringsAsFactors = F)
modelMat <- model.matrix(~resultDF$resultID)
dummyDF <- resultDF %>%
# group_by(resultRun) %>%
mutate(A = ifelse(resultID == 'A', 1, 0),
B = ifelse(resultID == 'B', 1, 0),
C = ifelse(resultID == 'C', 1, 0),
D = ifelse(resultID == 'D', 1, 0),
E = ifelse(resultID == 'E', 1, 0),
G = ifelse(resultID == 'G', 1, 0),
H = ifelse(resultID == 'H', 1, 0)) %>%
group_by(resultRun) %>%
summarise(A = sum(A),
B = sum(B),
C = sum(C),
D = sum(D),
E = sum(E),
G = sum(G),
H = sum(H))
Notice that even if I cleaned the intercept of modelMat
to be the dummy vector for A
, it's still missing B
since B
isn't in the results. dummyDF
is exactly how I want it to look, but the process is way too cumbersome. My actual problem has 100+ "IDs", and they are often changing every minute. I can't be constantly updating the piping to include the different items.
I'd like to use a function that returns dummyDF
with dummy vectors for every input in AllIDs
. Any help would be much appreciated.
Upvotes: 1
Views: 115
Reputation: 887193
We can do this easily by converting the 'resultID' column to factor
with levels
specified and then get the table
resultDF$resultID <- factor(resultID, levels = LETTERS[1:8])
cbind(resultRunn = unique(resultDF$resultRun), as.data.frame.matrix(+(table(resultDF)!=0)))
# resultRunn A B C D E F G H
#1 1 1 0 0 1 0 0 0 0
#2 2 0 0 1 0 0 0 1 0
#3 3 1 0 0 0 0 0 0 1
Upvotes: 1