Reputation: 15
I have two data frames in R: Large and Small. The smaller one is contained in the larger one. Importantly, there are no unique identifiers for each row in either data frame. How can I obtain the following:
Large - Small [large minus small]
Small data-frame (SmallDF):
ID CSF1PO CSF1PO.1 D10S1248 D10S1248.1 D12S391 D12S391.1
203079 10 11 14 16 -9 -9
203079 8 12 14 17 -9 -9
203080 10 12 13 13 -9 -9
Large data-frame (BigDF):
ID CSF1PO CSF1PO.1 D10S1248 D10S1248.1 D12S391 D12S391.1
203078 -9 -9 15 15 18 20
203078 -9 -9 14 15 17 19
203079 10 11 14 16 -9 -9
203079 8 12 14 17 -9 -9
203080 10 12 13 13 -9 -9
203080 10 11 14 16 -9 -9
203081 10 12 14 16 -9 -9
203081 11 12 15 16 -9 -9
203082 11 11 13 15 -9 -9
203082 11 11 13 14 -9 -9
The small data frame corresponds to the rows 3, 4 and 5 of the larger data frame.
I have tried the following.
BigDF[ !(BigDF$ID %in% SmallDF$ID), ]
This doesn't work because there are unique identifiers in either row. The output I get is exactly the same as BigDF.
I have also tried the following.
library(dplyr)
setdiff(BigDF, SmallDF)
The output I receive is exactly the same as BigDF.
Any help would be appreciated! Thanks.
Upvotes: 1
Views: 865
Reputation: 23101
With base R:
BigDF[-which(duplicated(rbind(BigDF, SmallDF), fromLast = TRUE)),]
with output:
ID CSF1PO CSF1PO.1 D10S1248 D10S1248.1 D12S391 D12S391.1
1 203078 -9 -9 15 15 18 20
2 203078 -9 -9 14 15 17 19
6 203080 10 11 14 16 -9 -9
7 203081 10 12 14 16 -9 -9
8 203081 11 12 15 16 -9 -9
9 203082 11 11 13 15 -9 -9
10 203082 11 11 13 14 -9 -9
Upvotes: 2
Reputation: 5008
library(dplyr)
anti_join(BigDF, SmallDF)
This is equivalent to:
anti_join(BigDF, SmallDF, by=c("ID", "CSF1PO", "CSF1PO.1", "D10S1248", "D10S1248.1", "D12S391", "D12S391.1"))
Obviously, if you had two variables which uniquely identify a row, you can specify just these variables in the vector passed to by
:
anti_join(BigDF, SmallDF, by=c("ID", "CSF1PO.1"))
Upvotes: 3