Garglesoap
Garglesoap

Reputation: 575

R ifelse based on an observation meeting two criteria

Yeah really simple problem just about to drive me over the deep end as I know there's a logical approach. I have two dataframes, and want to make a new column in the first dataframe 0/1 depending if any rows in the second contain a col1/col2 pair from the first.

Data

df.ex1 <- data.frame('name'=c('sally', 'joe', 'ben', 'nick'), 'grade1'=c('A', 'B', 'F', 'A'))
df.ex2 <- data.frame('name'=c('jed', 'ben', 'sally', 'nick'), 'grade1'=c('A', 'F', 'A', 'C'))

> df.ex1
   name grade1
1 sally      A
2   joe      B
3   ben      F
4  nick      A

> df.ex2
   name grade1
1   jed      A
2   ben      F
3 sally      A
4  nick      C

#Expected result:

       name grade1 bin
    1 sally      A  1
    2   joe      B  0
    3   ben      F  1
    4  nick      A  0

The obvious approach would be to check for presence of name-grade pair in second df:

df.ex1$bin <- ifelse(df.ex1[,1:2] %in% df.ex2[,1:2], 1, 0)

But this doesn't work. Why not? What's the correct approach? And actually what's the correct thought process to arrive at the correct approach?

Note, this obviously won't work:

df.ex1$bin <- ifelse(df.ex1[,1] %in% df.ex2[,1] & df.ex1[,2] %in% df.ex2[,2], 1, 0)

Upvotes: 1

Views: 69

Answers (3)

GKi
GKi

Reputation: 39737

You can join the two columns together with interaction and then use %in% like you have done it:

df.ex1$bin <- ifelse(interaction(df.ex1[,1:2]) %in% interaction(df.ex2[,1:2]), 1, 0)
df.ex1
#   name grade1 bin
#1 sally      A   1
#2   joe      B   0
#3   ben      F   1
#4  nick      A   0

interaction joins in this case the factors of the two columns, selected by e.g. df.ex1[,1:2], together and makes one vector.

df.ex1[,1:2] #gives you a data.frame
#   name grade1
#1 sally      A
#2   joe      B
#3   ben      F
#4  nick      A

interaction(df.ex1[,1:2]) #gives you a factor
#sally.A joe.B   ben.F   nick.A 

Those two factors/vectors are then applied to %in%.

Alternatively you could use paste like:

ifelse(do.call(paste, df.ex1[,1:2]) %in% do.call(paste, df.ex2[,1:2]), 1, 0)

Here paste has the same effect as interaction. It joins the two columns together and makes one vector.

do.call(paste, df.ex1[,1:2])
#"sally A" "joe B"   "ben F"   "nick A"

Upvotes: 1

IceCreamToucan
IceCreamToucan

Reputation: 28705

You can use a data.table "update join"

library(data.table)
setDT(df.ex1)
setDT(df.ex2)

df.ex1[, bin := 0]
df.ex1[df.ex2, on = .(name, grade1), bin := 1]

df.ex1
#     name grade1 bin
# 1: sally      A   1
# 2:   joe      B   0
# 3:   ben      F   1
# 4:  nick      A   0

Upvotes: 1

joran
joran

Reputation: 173737

You're probably looking for something based around a join first:

df.ex2$bin <- 1
res <- merge(df.ex1,df.ex2,all.x = TRUE)
res$bin <- ifelse(is.na(res$bin),0,1)

> res
   name grade1 bin
1   ben      F   1
2   joe      B   0
3  nick      A   0
4 sally      A   1

There will be lots of equivalent ways to do this using other popular packages like dplyr or data.table.

Upvotes: 3

Related Questions