Reputation: 415
I have a df1 like this:
text 1
text 2
text 3
text 4
text 5
And another one df2 like this:
text 1
text 2
text 3
text 5
The problem is my dfs are almost the same, they have many rows and I can find the one that is additional to the first df in order to know who is it?
Is there any possible option to compare the two dfs and find the one row it is the difference between them?
Upvotes: 4
Views: 108
Reputation: 35314
Here's a data.table solution:
library(data.table)
df1 <- data.frame(V1=rep('text',5),V2=1:5)
df2 <- data.frame(V1=rep('text',4),V2=c(1:3,5))
setkey(setDT(df1))[!df2]
## V1 V2
## 1: text 4
Upvotes: 5
Reputation: 13570
Base package:
df <- merge(x = df1, y = df2, by = "V2", all.x = TRUE)
df[!complete.cases(df), ]
Output:
V2 V1.x V1.y
4 4 text <NA>
With sqldf
:
library(sqldf)
sqldf("SELECT * FROM df1
LEFT JOIN
df2 USING (V2)
WHERE df2.V1 IS NULL")
Output:
V1 V2 V1
1 text 4 <NA>
Upvotes: 2
Reputation: 7190
A base R solution.
df1[-merge(df1, df2)[,2], ]
V1 V2
4 text 4
or:
df1[-which(df1[ , 2] %in% df2[, 2]), ]
V1 V2
4 text 4
after thinking about a base R solution I realized that my previous solutions might be weak with some data. Here is a more robust solution I think.
df1[ !df1$V2 %in% merge(df1, df2)[,2, drop = T], ]
V1 V2
4 text 4
Upvotes: 3
Reputation: 99341
You can rbind
the two and then find the non-duplicated rows.
For example, if you have data frames a
and b
then
x <- rbind(a, b)
x[!duplicated(x) & !duplicated(x, fromLast = TRUE), ]
# V1 V2
# 4 text 4
Or if you prefer, you can use dplyr::setdiff()
, which has a data frame method.
dplyr::setdiff(a, b)
# V1 V2
# 1 text 4
where
a <- read.table(text = "text 1
text 2
text 3
text 4
text 5", header = FALSE)
b <- a[-4, ]
Upvotes: 5