Data matching in R

Question

I have two dataframes with the same length (1000) and width (200). In both dataframes, each row is a person. In one dataframe, each column is a binary item score (i.e. 0 or 1). In the other dataframe, each column is the item label. Here is it:

Dataframe 1:

item1 item2 item3
0     1     1
1     0     0
1     1     1

Dataframe 2:

item1   item2   item3
C2HSD   WW11S3  EI22S
WW11S3  2JDDS   TT6SQ1
EI22S   TT6SQ1  331ID

What I want is a combined and matched dataframe like this:

C2HSD  WW11S3 EI22S 2JDDS TT6SQ1 331ID
0      1      1     NA    NA     NA
NA     1      NA    0     0      NA
NA     NA     1     NA    1      1

Thank you!

akrun · Accepted Answer

We can melt the two datasets to 'long' format', do a left_join, and later spread it to 'wide' format after removing the 'Var2'

library(reshape2)
library(tidyverse)
d1 <- melt(as.matrix(df1))
d2 <- melt(as.matrix(df2))
left_join(d2, d1, by = c("Var1", "Var2")) %>% 
      select(-Var2) %>% 
      spread(value.x, value.y) %>%
      select(-Var1)
#   2JDDS 331ID C2HSD EI22S TT6SQ WW11S
#1    NA    NA     0     1    NA     1
#2     0    NA    NA    NA     0     1
#3    NA     1    NA     1     1    NA

A base R option would be to replace the corresponding column values of 'df2' with NA where the 'df1' values are 0 using Map, then stack it to 'data.frame', transform the 'values' column to factor and get the frequency with table

un1 <- unique(unlist(df2))
table(transform(stack(Map(function(x,y) replace(y, !x, NA), 
  df1, df2))[2:1], values = factor(values, levels = un1)))

Data matching in R

Answers (2)

Related Questions