Reputation: 339
I have two dataframes with the same length (1000) and width (200). In both dataframes, each row is a person. In one dataframe, each column is a binary item score (i.e. 0 or 1). In the other dataframe, each column is the item label. Here is it:
Dataframe 1:
item1 item2 item3
0 1 1
1 0 0
1 1 1
Dataframe 2:
item1 item2 item3
C2HSD WW11S3 EI22S
WW11S3 2JDDS TT6SQ1
EI22S TT6SQ1 331ID
What I want is a combined and matched dataframe like this:
C2HSD WW11S3 EI22S 2JDDS TT6SQ1 331ID
0 1 1 NA NA NA
NA 1 NA 0 0 NA
NA NA 1 NA 1 1
Thank you!
Upvotes: 3
Views: 72
Reputation: 38500
An attempt in base R uses mapply
and match
as follows. The code below uses match
to return a vector with NAs where a column of dat2 does not have any of the variables and the respective value of dat1 where there is a match in dat2. For the desired output structure, the dat1 data.frame has to be transposed (data.frame(t(dat1))
).
# get the vector of unique names in dat2
vars <- unique(unlist(dat2))
mapply(function(x, y, vars) x[match(vars, y)],
data.frame(t(dat1)), dat2, MoreArgs=list(vars=vars))
X1 X2 X3
[1,] 0 NA NA
[2,] 1 1 NA
[3,] 1 NA 1
[4,] NA 0 NA
[5,] NA 0 1
[6,] NA NA 1
to return a data.frame with the named variables, wrap this in t
, data.frame
, and setNames
.
setNames(data.frame(t(mapply(function(x, y, vars) x[match(vars, y)],
data.frame(t(dat1)), dat2, MoreArgs=list(vars=vars)))), vars)
C2HSD WW11S3 EI22S 2JDDS TT6SQ1 331ID
X1 0 1 1 NA NA NA
X2 NA 1 NA 0 0 NA
X3 NA NA 1 NA 1 1
The data below has dat2 as character vectors rather than factors. This is the preferable storage type for this sort of operation.
data
dat1 <-
structure(list(item1 = c(0L, 1L, 1L), item2 = c(1L, 0L, 1L),
item3 = c(1L, 0L, 1L)), .Names = c("item1", "item2", "item3"
), class = "data.frame", row.names = c(NA, -3L))
dat2 <-
structure(list(item1 = c("C2HSD", "WW11S3", "EI22S"), item2 = c("WW11S3",
"2JDDS", "TT6SQ1"), item3 = c("EI22S", "TT6SQ1", "331ID")), .Names = c("item1",
"item2", "item3"), class = "data.frame", row.names = c(NA, -3L
))
Upvotes: 2
Reputation: 886938
We can melt
the two datasets to 'long' format', do a left_join
, and later spread
it to 'wide' format after removing the 'Var2'
library(reshape2)
library(tidyverse)
d1 <- melt(as.matrix(df1))
d2 <- melt(as.matrix(df2))
left_join(d2, d1, by = c("Var1", "Var2")) %>%
select(-Var2) %>%
spread(value.x, value.y) %>%
select(-Var1)
# 2JDDS 331ID C2HSD EI22S TT6SQ WW11S
#1 NA NA 0 1 NA 1
#2 0 NA NA NA 0 1
#3 NA 1 NA 1 1 NA
A base R
option would be to replace
the corresponding column values of 'df2' with NA where the 'df1' values are 0 using Map
, then stack
it to 'data.frame', transform
the 'values' column to factor
and get the frequency with table
un1 <- unique(unlist(df2))
table(transform(stack(Map(function(x,y) replace(y, !x, NA),
df1, df2))[2:1], values = factor(values, levels = un1)))
Upvotes: 2