CoolGuyHasChillDay
CoolGuyHasChillDay

Reputation: 747

Creating dummy matrix from vector NOT in data frame

I'm working on a problem where I have 100+ items, and the result of the problem contains those 100+ items organized in groups of 2 n times. I want to create an n x 100 dummy matrix for the result dataset which indicates whether or not the item was in the specific "run". I can usually easily do this with model.matrix, however my results sometimes don't have every item in the dataframe, and I want these to be all 0's. Example:

library(dplyr)
AllIDs <- c('A', 'B', 'C', 'D', 'E', 'G', 'H')

resultID <- c('D', 'A', 'C', 'G', 'A', 'H')
resultRun <- (rep(1:3, each = 2))
resultDF <- data.frame(resultRun, resultID, stringsAsFactors = F)

modelMat <- model.matrix(~resultDF$resultID)

dummyDF <- resultDF %>% 
  # group_by(resultRun) %>% 
  mutate(A = ifelse(resultID == 'A', 1, 0),
         B = ifelse(resultID == 'B', 1, 0),
         C = ifelse(resultID == 'C', 1, 0),
         D = ifelse(resultID == 'D', 1, 0),
         E = ifelse(resultID == 'E', 1, 0),
         G = ifelse(resultID == 'G', 1, 0),
         H = ifelse(resultID == 'H', 1, 0)) %>% 
  group_by(resultRun) %>% 
  summarise(A = sum(A),
            B = sum(B),
            C = sum(C),
            D = sum(D),
            E = sum(E),
            G = sum(G),
            H = sum(H))

Notice that even if I cleaned the intercept of modelMat to be the dummy vector for A, it's still missing B since B isn't in the results. dummyDF is exactly how I want it to look, but the process is way too cumbersome. My actual problem has 100+ "IDs", and they are often changing every minute. I can't be constantly updating the piping to include the different items.

I'd like to use a function that returns dummyDF with dummy vectors for every input in AllIDs. Any help would be much appreciated.

Upvotes: 1

Views: 115

Answers (1)

akrun
akrun

Reputation: 887193

We can do this easily by converting the 'resultID' column to factor with levels specified and then get the table

resultDF$resultID <- factor(resultID, levels = LETTERS[1:8])
cbind(resultRunn = unique(resultDF$resultRun), as.data.frame.matrix(+(table(resultDF)!=0)))
#  resultRunn A B C D E F G H
#1          1 1 0 0 1 0 0 0 0
#2          2 0 0 1 0 0 0 1 0
#3          3 1 0 0 0 0 0 0 1

Upvotes: 1

Related Questions