Reputation: 575
Yeah really simple problem just about to drive me over the deep end as I know there's a logical approach. I have two dataframes, and want to make a new column in the first dataframe 0/1 depending if any rows in the second contain a col1/col2 pair from the first.
Data
df.ex1 <- data.frame('name'=c('sally', 'joe', 'ben', 'nick'), 'grade1'=c('A', 'B', 'F', 'A'))
df.ex2 <- data.frame('name'=c('jed', 'ben', 'sally', 'nick'), 'grade1'=c('A', 'F', 'A', 'C'))
> df.ex1
name grade1
1 sally A
2 joe B
3 ben F
4 nick A
> df.ex2
name grade1
1 jed A
2 ben F
3 sally A
4 nick C
#Expected result:
name grade1 bin
1 sally A 1
2 joe B 0
3 ben F 1
4 nick A 0
The obvious approach would be to check for presence of name-grade pair in second df:
df.ex1$bin <- ifelse(df.ex1[,1:2] %in% df.ex2[,1:2], 1, 0)
But this doesn't work. Why not? What's the correct approach? And actually what's the correct thought process to arrive at the correct approach?
Note, this obviously won't work:
df.ex1$bin <- ifelse(df.ex1[,1] %in% df.ex2[,1] & df.ex1[,2] %in% df.ex2[,2], 1, 0)
Upvotes: 1
Views: 69
Reputation: 39737
You can join the two columns together with interaction
and then use %in%
like you have done it:
df.ex1$bin <- ifelse(interaction(df.ex1[,1:2]) %in% interaction(df.ex2[,1:2]), 1, 0)
df.ex1
# name grade1 bin
#1 sally A 1
#2 joe B 0
#3 ben F 1
#4 nick A 0
interaction
joins in this case the factors of the two columns, selected by e.g. df.ex1[,1:2]
, together and makes one vector.
df.ex1[,1:2] #gives you a data.frame
# name grade1
#1 sally A
#2 joe B
#3 ben F
#4 nick A
interaction(df.ex1[,1:2]) #gives you a factor
#sally.A joe.B ben.F nick.A
Those two factors/vectors are then applied to %in%
.
Alternatively you could use paste
like:
ifelse(do.call(paste, df.ex1[,1:2]) %in% do.call(paste, df.ex2[,1:2]), 1, 0)
Here paste
has the same effect as interaction
. It joins the two columns together and makes one vector.
do.call(paste, df.ex1[,1:2])
#"sally A" "joe B" "ben F" "nick A"
Upvotes: 1
Reputation: 28705
You can use a data.table "update join"
library(data.table)
setDT(df.ex1)
setDT(df.ex2)
df.ex1[, bin := 0]
df.ex1[df.ex2, on = .(name, grade1), bin := 1]
df.ex1
# name grade1 bin
# 1: sally A 1
# 2: joe B 0
# 3: ben F 1
# 4: nick A 0
Upvotes: 1
Reputation: 173737
You're probably looking for something based around a join first:
df.ex2$bin <- 1
res <- merge(df.ex1,df.ex2,all.x = TRUE)
res$bin <- ifelse(is.na(res$bin),0,1)
> res
name grade1 bin
1 ben F 1
2 joe B 0
3 nick A 0
4 sally A 1
There will be lots of equivalent ways to do this using other popular packages like dplyr or data.table.
Upvotes: 3