Sumer Vaid
Sumer Vaid

Reputation: 15

Subtracting a smaller data frame from a larger data-frame in R without unique row ID

I have two data frames in R: Large and Small. The smaller one is contained in the larger one. Importantly, there are no unique identifiers for each row in either data frame. How can I obtain the following:

Large - Small [large minus small]

Small data-frame (SmallDF):

     ID       CSF1PO CSF1PO.1 D10S1248 D10S1248.1 D12S391 D12S391.1
203079           10       11       14         16      -9        -9
203079            8       12       14         17      -9        -9
203080           10       12       13         13      -9        -9

Large data-frame (BigDF):

      ID      CSF1PO CSF1PO.1 D10S1248 D10S1248.1 D12S391 D12S391.1
203078          -9       -9       15         15      18        20
203078          -9       -9       14         15      17        19
203079          10       11       14         16      -9        -9
203079           8       12       14         17      -9        -9
203080          10       12       13         13      -9        -9
203080          10       11       14         16      -9        -9
203081          10       12       14         16      -9        -9
203081          11       12       15         16      -9        -9
203082          11       11       13         15      -9        -9
203082          11       11       13         14      -9        -9    

The small data frame corresponds to the rows 3, 4 and 5 of the larger data frame.

I have tried the following.

BigDF[ !(BigDF$ID %in% SmallDF$ID), ] 

This doesn't work because there are unique identifiers in either row. The output I get is exactly the same as BigDF.

I have also tried the following.

library(dplyr)
setdiff(BigDF, SmallDF)

The output I receive is exactly the same as BigDF.

Any help would be appreciated! Thanks.

Upvotes: 1

Views: 865

Answers (2)

Sandipan Dey
Sandipan Dey

Reputation: 23101

With base R:

BigDF[-which(duplicated(rbind(BigDF, SmallDF), fromLast = TRUE)),]

with output:

       ID CSF1PO CSF1PO.1 D10S1248 D10S1248.1 D12S391 D12S391.1
1  203078     -9       -9       15         15      18        20
2  203078     -9       -9       14         15      17        19
6  203080     10       11       14         16      -9        -9
7  203081     10       12       14         16      -9        -9
8  203081     11       12       15         16      -9        -9
9  203082     11       11       13         15      -9        -9
10 203082     11       11       13         14      -9        -9

Upvotes: 2

yeedle
yeedle

Reputation: 5008

library(dplyr)
anti_join(BigDF, SmallDF)

This is equivalent to:

anti_join(BigDF, SmallDF, by=c("ID", "CSF1PO", "CSF1PO.1", "D10S1248", "D10S1248.1", "D12S391", "D12S391.1"))

Obviously, if you had two variables which uniquely identify a row, you can specify just these variables in the vector passed to by:

anti_join(BigDF, SmallDF, by=c("ID", "CSF1PO.1"))

Upvotes: 3

Related Questions