Reputation: 7928
I have a vector of lenght m (in this example m=10) with IDs:
set.seed(12222017)
library(dplyr)
N <- 100
IDs <- do.call(paste0, replicate(7, sample(LETTERS, 10, TRUE), FALSE))
And I have a data frame with 1+J columns and N rows
df1 <- data.frame(DRAW=1:N, V1=sample(IDs,N, replace = T),
V2=sample(IDs,N, replace = T),
V3=sample(IDs,N, replace = T)) %>%
mutate(V1 = as.character(V1), V2 = as.character(V2), V3=as.character(V3))
I want to use that data to generate a new data frame like the following:
DRAW OYKGVZZ OWGNEYU MGPARZW GZXTXFV IXNGUCE QMYFNVZ FLZPQDJ XXSOCZZ QHBSIFX GQBZNGQ
1: 1 1 0 0 0 1 0 0 0 1 0
2: 2 0 0 0 0 1 0 0 1 0 1
3: 3 0 0 0 1 0 0 0 1 0 1
4: 4 0 0 1 0 0 0 0 1 1 0
5: 5 0 0 0 0 1 0 1 1 0 0
6: 6 0 0 0 1 0 1 0 0 0 0
I can do this with this code:
checkRowXidX <- function(DRAW, idX){
check <- idX %in% df1[DRAW,-1]
out <- data.frame(DRAW = DRAW, idX=idX, check = as.numeric(check))
}
tests <- expand.grid(df1$DRAW,IDs)
checks <- purrr::map2(tests$Var1, tests$Var2, checkRowXidX) %>%
data.table::rbindlist() %>% tidyr::spread(idX, check)
checks %>% head
Is there a more efficient way of doing this? In practice, I will be working with bigger data and this approach would take a while to run.
Upvotes: 0
Views: 127
Reputation: 28329
One of many solutions is dcast()
from reshape2
:
# Using OPs data
library(reshape2)
dcast(melt(df1, 1), DRAW ~ value)
Upvotes: 2