Reputation: 471
I've searched online for a bit and cannot find a solution to my problem.
I have two dataframes (of different lengths):
dataSC_SC_combos
SURVEY_DATE DATA_COLLECTION_SITE
2012-07-01 Site 1
2012-07-01 Site 2
2012-08-10 Site 2
2012-08-10 Site 3
2012-08-11 Site 2
2012-09-20 Site 1
and
dataSC_FSITE_combos
SURVEY_DATE FISHING_SITE
2012-07-01 Site 1
2012-07-01 Site 3
2012-08-10 Site 2
2012-08-11 Site 1
2012-08-11 Site 1
2012-09-20 Site 1
2012-09-26 Site 1
2012-09-27 Site 1
2012-10-14 Site 1
I want to find the unique combinations of data and site from these, aka.
2012-07-01 Site 1
2012-08-10 Site 2
2012-09-20 Site 1
and then subset from a larger dataset for these unique combinations for either SURVEY_DATE & FISHING_SITE or SURVEY_DATE & DATA_COLLECTION_SITE. So that I get something like this:
SURVEY_DATE FISHING_SITE DATA_COLLECTION_SITE
2012-07-01 Site 1
2012-07-01 Site 1
2012-08-10 Site 2
2012-09-20 Site 1
2012-09-20 Site 1
Does anyone know of a good way to go about doing this? Thanks in advance.
Upvotes: 2
Views: 985
Reputation: 54287
Using dplyr
for merging and plyr
for ordering and binding rows (do they live in harmony with each other?):
library(dplyr)
library(plyr)
names(dataSC_FSITE_combos) <- names(dataSC_SC_combos) <- c("SURVEY_DATE", "SITE")
ijoin <- inner_join(dataSC_SC_combos, dataSC_FSITE_combos)
# SURVEY_DATE SITE
# 1 2012-07-01 Site 1
# 2 2012-08-10 Site 2
# 3 2012-09-20 Site 1
arrange(rbind.fill(
setNames(
left_join(ijoin, dataSC_SC_combos), c("SURVEY_DATE","DATA_COLLECTION_SITE")),
setNames(
left_join(ijoin, dataSC_FSITE_combos), c("SURVEY_DATE", "FISHING_SITE"))), SURVEY_DATE)
# SURVEY_DATE DATA_COLLECTION_SITE FISHING_SITE
# 1 2012-07-01 Site 1 <NA>
# 2 2012-07-01 <NA> Site 1
# 3 2012-08-10 Site 2 <NA>
# 4 2012-08-10 <NA> Site 2
# 5 2012-09-20 Site 1 <NA>
# 6 2012-09-20 <NA> Site 1
Upvotes: 1
Reputation: 7840
There is probably better solutions but you can do :
d1 <- do.call("paste", df1)
d2 <- do.call("paste", df2)
> df1[d1%in%d2, ]
SURVEY_DATE DATA_COLLECTION_SITE
1 2012-07-01 Site 1
3 2012-08-10 Site 2
6 2012-09-20 Site 1
And for the final result :
> df3 <- df1[d1%in%d2, ]
> df4 <- df2[d2%in%d1, ]
>
> df3$FISHING_SITE <- NA
> df4$DATA_COLLECTION_SITE <- NA
>
> rbind(df3, df4)
SURVEY_DATE DATA_COLLECTION_SITE FISHING_SITE
1 2012-07-01 Site 1 <NA>
3 2012-08-10 Site 2 <NA>
6 2012-09-20 Site 1 <NA>
11 2012-07-01 <NA> Site 1
31 2012-08-10 <NA> Site 2
61 2012-09-20 <NA> Site 1
Upvotes: 1