Reputation: 2015

Compare two data.frames with different columns to find the rows in data.frame 1 missing in other

I have two dataframes as follows,

a1 <- data.frame(a = 1:5, b=letters[1:5], c = 1:5)
a2 <- data.frame(a = 1:3, b=letters[1:3], d = 1:3)

I want to find the rows a1 is not present in a2 with respect to the first two columns(a,b) alone. My ideal output should be,

  a b c match
1 1 a 1  yes
2 2 b 2  yes
3 3 c 3  yes
4 4 d 4   no
5 5 e 5   no

I have tried the following,

output <- sqldf('SELECT * FROM a1 EXCEPT SELECT * FROM a2')

but this one works only when there are equal columns on both the dataframes and also the names are same. But I want to find only for matches in (a,b) columns and give the output in a1 with yes / no.

Can anybody help me in finding this?

Upvotes: 1

Answers (3)

user2100721

Reputation: 3587

There is another option. You can use match_df function of plyr package.

library(plyr)
a1$match <- ifelse(row.names(a1) %in% row.names(match_df(a1,a2)),"yes","no")

Output

  a b c match
1 1 a 1   yes
2 2 b 2   yes
3 3 c 3   yes
4 4 d 4    no
5 5 e 5    no

Upvotes: 1

akrun

Reputation: 886938

We can do a merge and find the NA values

c("no", "yes")[(!is.na(merge(a1, a2, by = c("a", "b"), all.x=TRUE)$d))+1L]
#[1] "yes" "yes" "yes" "no"  "no"

Or without mergeing, we can paste the columns together and do a comparison with %in% and convert the logical to "yes/no"

c('no', 'yes')[(paste(a1$a, a1$b) %in% paste(a2$a, a2$b))+1]
#[1] "yes" "yes" "yes" "no"  "no"

Or using dplyr

library(dplyr)
left_join(a1, a2, by = c("a", "b")) %>%
           mutate(d = c("no", "yes")[(!is.na(d))+1])
#   a b c   d
# 1 1 a 1 yes
# 2 2 b 2 yes
# 3 3 c 3 yes
# 4 4 d 4  no
# 5 5 e 5  no

Upvotes: 4

milan

Reputation: 4970

Use the function row.match in the library prodlim. This returns a vector with the number of (first) match and NA otherwise. Combine that with ifelse to assign yes/no.

library(prodlim)
a1$match <- ifelse(is.na(row.match(a1, a2)), "no", "yes")     

#  a b c match
#1 1 a 1   yes
#2 2 b 2   yes
#3 3 c 3   yes
#4 4 d 4    no
#5 5 e 5    no

Upvotes: 2

Compare two data.frames with different columns to find the rows in data.frame 1 missing in other

Answers (3)

Related Questions