Reputation: 2015
I have two dataframes as follows,
a1 <- data.frame(a = 1:5, b=letters[1:5], c = 1:5)
a2 <- data.frame(a = 1:3, b=letters[1:3], d = 1:3)
I want to find the rows a1 is not present in a2 with respect to the first two columns(a,b) alone. My ideal output should be,
a b c match
1 1 a 1 yes
2 2 b 2 yes
3 3 c 3 yes
4 4 d 4 no
5 5 e 5 no
I have tried the following,
output <- sqldf('SELECT * FROM a1 EXCEPT SELECT * FROM a2')
but this one works only when there are equal columns on both the dataframes and also the names are same. But I want to find only for matches in (a,b) columns and give the output in a1 with yes / no.
Can anybody help me in finding this?
Upvotes: 1
Views: 2463
Reputation: 3587
There is another option. You can use match_df
function of plyr
package.
library(plyr)
a1$match <- ifelse(row.names(a1) %in% row.names(match_df(a1,a2)),"yes","no")
Output
a b c match
1 1 a 1 yes
2 2 b 2 yes
3 3 c 3 yes
4 4 d 4 no
5 5 e 5 no
Upvotes: 1
Reputation: 886938
We can do a merge
and find the NA
values
c("no", "yes")[(!is.na(merge(a1, a2, by = c("a", "b"), all.x=TRUE)$d))+1L]
#[1] "yes" "yes" "yes" "no" "no"
Or without merge
ing, we can paste
the columns together and do a comparison with %in%
and convert the logical to "yes/no"
c('no', 'yes')[(paste(a1$a, a1$b) %in% paste(a2$a, a2$b))+1]
#[1] "yes" "yes" "yes" "no" "no"
Or using dplyr
library(dplyr)
left_join(a1, a2, by = c("a", "b")) %>%
mutate(d = c("no", "yes")[(!is.na(d))+1])
# a b c d
# 1 1 a 1 yes
# 2 2 b 2 yes
# 3 3 c 3 yes
# 4 4 d 4 no
# 5 5 e 5 no
Upvotes: 4
Reputation: 4970
Use the function row.match
in the library prodlim
. This returns a vector with the number of (first) match and NA
otherwise. Combine that with ifelse
to assign yes/no.
library(prodlim)
a1$match <- ifelse(is.na(row.match(a1, a2)), "no", "yes")
# a b c match
#1 1 a 1 yes
#2 2 b 2 yes
#3 3 c 3 yes
#4 4 d 4 no
#5 5 e 5 no
Upvotes: 2