Reputation: 654
This might seem a somewhat basic question, but after reviewing documentation and searching the Internet I couldn't find an answer to this.
My question is: does R's dplyr
include an argument to automatically generate a column with merge results after joining two dataframes? With join operations I mean left_join
, right_join
, or full_join
.
I'm thinking of something similar to the _merge
column you can generate on Python's pandas
with the argument indicator = True
when using merge
, or to Stata's _merge
variable that is generated by default using the merge
command.
For example:
df1 <- data.frame(id = c(1,2), value1 = c(1,1))
df2 <- data.frame(id = c(1,3), value2 = c(2,1))
result <- df1 %>% full_join(df2, by = 'id', merge_results_option = TRUE)
# Notice that merge_results_option is the argument I'm looking for with this question, if it exists
Should yield:
|--------------|---------------|--------------|--------------|
| id | value1 | value2 | _merge |
|--------------|---------------|--------------|--------------|
| 1 | 1 | 2 | both |
| 2 | 1 | NA | left |
| 3 | NA | 1 | right |
|--------------|---------------|--------------|--------------|
Upvotes: 0
Views: 384
Reputation: 509
As already said, I don't think this is a functionality of dplyr. Interestingly it's not something that is used in SQL either where joins are very common. However, the column can be recreated using mutate:
df1 <- data.frame(id = c(1,2), value1 = c(1,1))
df2 <- data.frame(id = c(1,3), value2 = c(2,1))
result <- df1 %>%
left_join(df2, by = 'id') %>%
mutate(merge = case_when(!is.na(value1) & !is.na(value2) ~ "both",
!is.na(value1) & is.na(value2) ~ "left",
is.na(value1) & !is.na(value2) ~ "right",
TRUE ~ "missing"
))
Upvotes: 1
Reputation: 146239
This is implemented, more-or-less, by the tidylog
package. It prints a message summarizing the number of matches, but does add a column identifying each row.
You can read the discussion on the dplyr
Feature request here, which lead to the development of tidylog
, which ended with
Given that
tidylog
has solved this problem so wonderfully, I no longer think this needs to be indplyr
though some users were still dissatisfied with lack of the column-add functionality. Surprisingly, I didn't see that feature request in the tidylog issues, so maybe you could add it there.
Upvotes: 1