Apolo Reis
Apolo Reis

Reputation: 97

I want to check if 2 columns of any row from df1 == the same 2 columns of any row in df2

I have 2 dfs like this

df1 <- data.frame("A" = c(1,2,3,4,5), "B" = c(10,20,30,40,50), "C" = c(6,7,8,9,11))
df2 <- data.frame("A" = c(10,4,30,20,50), "B" = c(1,40,3,7,5)), "C" = c(12,13,14,15))

I want to find if there is a row in df 1 that == a row in df2 for the columns A and B. You can see that df1[4,1:2] == df2[2,1:2]

I tried

for (i in 1:5){
  if (for (j in 1:5){
    df[i,1:2] == df2[j,1:2]})
    print("true")
}

But it gives me this error: Error in if (for (j in 1:5) { : argument is of length zero

Upvotes: 1

Views: 77

Answers (4)

akrun
akrun

Reputation: 887078

An option with data.table join

library(data.table)
setDT(df1)[!df2, on = .(A, B)]
#   A  B  C
#1: 1 10  6
#2: 2 20  7
#3: 3 30  8
#4: 5 50 11

data

df1 <- data.frame("A" = c(1,2,3,4,5), "B" = c(10,20,30,40,50), "C" = c(6,7,8,9,11))
df2 <- data.frame("A" = c(10,4,30,20,50), "B" = c(1,40,3,7,5), "C" = c(12,13,14,15, 15))

Upvotes: 0

Ronak Shah
Ronak Shah

Reputation: 388962

You can row-wise paste the values and check duplicates using %in% :

df1[do.call(paste, df1[1:2]) %in% do.call(paste, df2[1:2]),]
#  A  B C
#4 4 40 9

If you need only single TRUE/FALSE value

any(do.call(paste, df1[1:2]) %in% do.call(paste, df2[1:2]))
#[1] TRUE

If you want to remove rows in df1 which is present in df2 you can use anti_join from dplyr.

dplyr::anti_join(df1, df2, by = c('A', 'B'))
#  A  B  C
#1 1 10  6
#2 2 20  7
#3 3 30  8
#4 5 50 11

To get common rows you can use semi_join/inner_join :

dplyr::semi_join(df1, df2, by = c('A', 'B'))

Upvotes: 2

Rui Barradas
Rui Barradas

Reputation: 76402

Here is a solution returning the rows in df1 and df2, columns A and B, that match.

res <- apply(df2[1:2], 1, function(y){
  apply(df1[1:2], 1, function(x) all(x == y))
})

which(res, arr.ind = TRUE)
#     row col
#[1,]   4   2

w <- which(res, arr.ind = TRUE)
colnames(w) <- c('df1', 'df2')
w
#     df1 df2
#[1,]   4   2

Upvotes: 2

lroha
lroha

Reputation: 34416

You can row bind columns A and B and use anyDuplicated():

anyDuplicated(rbind(df1[1:2], df2[1:2])) > 0
[1] TRUE

If there are potential duplicates within data frames you'll need to make them unique first:

anyDuplicated(rbind(unique(df1[1:2]), unique(df2[1:2]))) > 0

Upvotes: 2

Related Questions