EricaO
EricaO

Reputation: 471

Finding common rows in 2 dataframes

I've searched online for a bit and cannot find a solution to my problem.

I have two dataframes (of different lengths):

dataSC_SC_combos

SURVEY_DATE DATA_COLLECTION_SITE
2012-07-01  Site 1
2012-07-01  Site 2
2012-08-10  Site 2
2012-08-10  Site 3
2012-08-11  Site 2
2012-09-20  Site 1

and

dataSC_FSITE_combos

SURVEY_DATE FISHING_SITE
2012-07-01  Site 1
2012-07-01  Site 3
2012-08-10  Site 2
2012-08-11  Site 1
2012-08-11  Site 1
2012-09-20  Site 1
2012-09-26  Site 1
2012-09-27  Site 1
2012-10-14  Site 1

I want to find the unique combinations of data and site from these, aka.

2012-07-01  Site 1
2012-08-10  Site 2
2012-09-20  Site 1

and then subset from a larger dataset for these unique combinations for either SURVEY_DATE & FISHING_SITE or SURVEY_DATE & DATA_COLLECTION_SITE. So that I get something like this:

SURVEY_DATE FISHING_SITE DATA_COLLECTION_SITE
2012-07-01  Site 1       
2012-07-01               Site 1
2012-08-10               Site 2
2012-09-20  Site 1
2012-09-20               Site 1

Does anyone know of a good way to go about doing this? Thanks in advance.

Upvotes: 2

Views: 985

Answers (2)

lukeA
lukeA

Reputation: 54287

Using dplyr for merging and plyr for ordering and binding rows (do they live in harmony with each other?):

library(dplyr)
library(plyr)
names(dataSC_FSITE_combos) <- names(dataSC_SC_combos) <- c("SURVEY_DATE", "SITE")
ijoin <- inner_join(dataSC_SC_combos, dataSC_FSITE_combos)
#   SURVEY_DATE   SITE
# 1  2012-07-01 Site 1
# 2  2012-08-10 Site 2
# 3  2012-09-20 Site 1

arrange(rbind.fill(
  setNames(
    left_join(ijoin, dataSC_SC_combos), c("SURVEY_DATE","DATA_COLLECTION_SITE")),
  setNames(
    left_join(ijoin, dataSC_FSITE_combos), c("SURVEY_DATE", "FISHING_SITE"))), SURVEY_DATE)
#   SURVEY_DATE DATA_COLLECTION_SITE FISHING_SITE
# 1  2012-07-01               Site 1         <NA>
# 2  2012-07-01                 <NA>       Site 1
# 3  2012-08-10               Site 2         <NA>
# 4  2012-08-10                 <NA>       Site 2
# 5  2012-09-20               Site 1         <NA>
# 6  2012-09-20                 <NA>       Site 1

Upvotes: 1

Julien Navarre
Julien Navarre

Reputation: 7840

There is probably better solutions but you can do :

d1 <- do.call("paste", df1)
d2 <- do.call("paste", df2)
> df1[d1%in%d2, ]
  SURVEY_DATE DATA_COLLECTION_SITE
1  2012-07-01               Site 1
3  2012-08-10               Site 2
6  2012-09-20               Site 1

And for the final result :

> df3 <- df1[d1%in%d2, ]
> df4 <- df2[d2%in%d1, ]
> 
> df3$FISHING_SITE <- NA
> df4$DATA_COLLECTION_SITE <- NA
> 
> rbind(df3, df4)
   SURVEY_DATE DATA_COLLECTION_SITE FISHING_SITE
1   2012-07-01               Site 1         <NA>
3   2012-08-10               Site 2         <NA>
6   2012-09-20               Site 1         <NA>
11  2012-07-01                 <NA>       Site 1
31  2012-08-10                 <NA>       Site 2
61  2012-09-20                 <NA>       Site 1

Upvotes: 1

Related Questions