Reputation: 415

Matching 2 almost same dataframes to be equal

I have a df1 like this:

text 1
text 2
text 3
text 4
text 5

And another one df2 like this:

text 1
text 2
text 3
text 5

The problem is my dfs are almost the same, they have many rows and I can find the one that is additional to the first df in order to know who is it?

Is there any possible option to compare the two dfs and find the one row it is the difference between them?

Upvotes: 4

Answers (4)

bgoldst

Reputation: 35314

Here's a data.table solution:

library(data.table)
df1 <- data.frame(V1=rep('text',5),V2=1:5)
df2 <- data.frame(V1=rep('text',4),V2=c(1:3,5))
setkey(setDT(df1))[!df2]
##      V1 V2
## 1: text  4

Upvotes: 5

mpalanco

Reputation: 13570

Base package:

df <- merge(x = df1, y = df2, by = "V2", all.x = TRUE)
df[!complete.cases(df), ]

Output:

  V2 V1.x V1.y
4  4 text <NA>

With sqldf:

library(sqldf)
sqldf("SELECT * FROM df1 
      LEFT JOIN 
      df2 USING (V2)
      WHERE df2.V1 IS NULL")

Output:

    V1 V2   V1
1 text  4 <NA>

Upvotes: 2

SabDeM

Reputation: 7190

A base R solution.

 df1[-merge(df1, df2)[,2], ]
    V1 V2
4 text  4

or:

 df1[-which(df1[ , 2] %in% df2[, 2]), ]
    V1 V2
4 text  4

Edit

after thinking about a base R solution I realized that my previous solutions might be weak with some data. Here is a more robust solution I think.

 df1[ !df1$V2 %in% merge(df1, df2)[,2, drop = T], ]
    V1 V2
4 text  4

Upvotes: 3

Rich Scriven

Reputation: 99341

You can rbind the two and then find the non-duplicated rows.

For example, if you have data frames a and b then

x <- rbind(a, b)
x[!duplicated(x) & !duplicated(x, fromLast = TRUE), ]
#     V1 V2
# 4 text  4

Or if you prefer, you can use dplyr::setdiff(), which has a data frame method.

dplyr::setdiff(a, b)
#     V1 V2
# 1 text  4

where

a <- read.table(text = "text 1
text 2
text 3
text 4
text 5", header = FALSE)

b <- a[-4, ]

Upvotes: 5

Matching 2 almost same dataframes to be equal

Answers (4)

Edit

Related Questions