younghyun
younghyun

Reputation: 341

R Compare duplicate values ​for each row in two data sets

I want to compare whether the values ​​in each row are the same. In this case, duplicated and all_equal function are not suitable.

Reproducible Sample Data

df1 <- data.frame(a=c(1,2,3),b=c(4,5,6))
df2 <- data.frame(a=c(1,2,4),b=c(4,5,6))

> df1
  a b
1 1 4
2 2 5
3 3 6
> df2
  a b
1 1 4
2 2 5
3 4 6

Expected output

final <- data.frame(a=c(1,2,4),b=c(4,5,6),c=c('T','T','F'))
#c column is the result I need. whether the values ​​in each row are the same.

>final
  a b c
1 1 4 T
2 2 5 T
3 4 6 F

I try method below... but This is complicated.

#1. making idx of df1, df2
#2. and full_join
#3. and left_join df1
#4. and left_join df2
df1$idx1 <- 1:nrow(df1)
df2$idx2 <- 1:nrow(df2)

df3<-full_join(df1,df2,by=c('a','b'))
df3<-left_join(df3,df1,by=c('a','b'))
df3<-left_join(df3,df2,by=c('a','b'))  #This may or may not work..

I think there must be a better way. help!

Upvotes: 1

Views: 706

Answers (4)

GKi
GKi

Reputation: 39657

In case the positions of the rows in each data.frame do not matter you can use merge.

within(merge(df2, within(df1, c <- "T"), all.x=TRUE), c[is.na(c)] <- "F")
#  a b c
#1 1 4 T
#2 2 5 T
#3 4 6 F

or using duplicated.

df2$c <- c("F", "T")[1+tail(duplicated(rbind(df1, df2)), nrow(df2))]
df2
#  a b c
#1 1 4 T
#2 2 5 T
#3 4 6 F

Upvotes: 0

Ronak Shah
Ronak Shah

Reputation: 388907

You may use rowSums -

final <- df2
final$c <- rowSums(df1 != df2) == 0
final

#  a b     c
#1 1 4  TRUE
#2 2 5  TRUE
#3 4 6 FALSE

Upvotes: 0

Brian Syzdek
Brian Syzdek

Reputation: 948

You can get column 'c' by:

c <- df1$a == df2$a & df1$b == df2$b

gives TRUE TRUE FALSE. It looks like you want to then bind this to df2, so

cbind.data.frame(df2, c)

Upvotes: 2

akrun
akrun

Reputation: 887048

We could use

df2$c <- Reduce(`&`, Map(`==`, df1, df2))

-output

> df2
  a b     c
1 1 4  TRUE
2 2 5  TRUE
3 4 6 FALSE

Upvotes: 2

Related Questions