Reputation: 1731
Okay, let's imagine we have a table (dataframe, tibble) about articles written by authors. Each article has been reviewed by another author, but not all authors are reviewers, e.g.
articles <- tibble(author=c("user1", "user2", "user3"), reviewer=c("user2", "user3", "user2"))
> articles
# A tibble: 3 x 2
author reviewer
<chr> <chr>
1 user1 user2
2 user2 user3
3 user3 user2
to just look at the columns of interest here. So we can see that user2
and user3
have authored articles and are also reviewers. user1
is not a reviewer. How do we actually return the users who have both authored and reviewed articles? In base R you could do:
authors <- unique(articles$author)
reviewers <- unique(articles$reviewer)
> authors[authors %in% reviewers]
[1] "user2" "user3"
but how can we do this in the tidyverse?
Upvotes: 4
Views: 76
Reputation: 28850
In base
we can do:
unique(articles$author[articles$author %in% articles$reviewer])
# [1] "user2" "user3"
Upvotes: 1
Reputation: 76402
Here is a solution without joins.
articles %>%
filter(author %in% reviewer) %>%
select(author) %>%
distinct()
## A tibble: 2 x 1
# author
# <chr>
#1 user2
#2 user3
Upvotes: 3
Reputation: 1731
It feels the answer would likely be in dplyr
, and I have refined my approach down to:
articles %>%
semi_join((.),
by=c("author"="reviewer")) %>%
select(author) %>%
distinct
# A tibble: 2 x 1
author
<chr>
1 user2
2 user3
This is the correct answer, and I admit I've refined it somewhat even whilst writing this. But has anyone got any alternative takes on the problem? I'm actually quite pleased how it's boiled down, but it took me a little thinking so I'll post it in the hope it helps someone else.
Upvotes: 4