Ignacio
Ignacio

Reputation: 7928

Check if id is present on columns of a data frame for each id and each row

I have a vector of lenght m (in this example m=10) with IDs:

set.seed(12222017)
library(dplyr)
N <- 100
IDs <- do.call(paste0, replicate(7, sample(LETTERS, 10, TRUE), FALSE))

And I have a data frame with 1+J columns and N rows

df1 <- data.frame(DRAW=1:N, V1=sample(IDs,N, replace = T), 
                  V2=sample(IDs,N, replace = T), 
                  V3=sample(IDs,N, replace = T)) %>% 
  mutate(V1 = as.character(V1), V2 = as.character(V2), V3=as.character(V3))

I want to use that data to generate a new data frame like the following:

   DRAW OYKGVZZ OWGNEYU MGPARZW GZXTXFV IXNGUCE QMYFNVZ FLZPQDJ XXSOCZZ QHBSIFX GQBZNGQ
1:    1       1       0       0       0       1       0       0       0       1       0
2:    2       0       0       0       0       1       0       0       1       0       1
3:    3       0       0       0       1       0       0       0       1       0       1
4:    4       0       0       1       0       0       0       0       1       1       0
5:    5       0       0       0       0       1       0       1       1       0       0
6:    6       0       0       0       1       0       1       0       0       0       0

I can do this with this code:

checkRowXidX <- function(DRAW, idX){
  check <- idX %in% df1[DRAW,-1]
  out <- data.frame(DRAW = DRAW, idX=idX, check = as.numeric(check))

}

tests <- expand.grid(df1$DRAW,IDs)

checks <- purrr::map2(tests$Var1, tests$Var2, checkRowXidX) %>% 
  data.table::rbindlist() %>% tidyr::spread(idX, check)

checks %>% head

Is there a more efficient way of doing this? In practice, I will be working with bigger data and this approach would take a while to run.

Upvotes: 0

Views: 127

Answers (1)

pogibas
pogibas

Reputation: 28329

One of many solutions is dcast() from reshape2:

# Using OPs data
library(reshape2)
dcast(melt(df1, 1), DRAW ~ value)

Upvotes: 2

Related Questions