doctorG
doctorG

Reputation: 1731

How to find values from one column that occur anywhere in another column of the same table

Okay, let's imagine we have a table (dataframe, tibble) about articles written by authors. Each article has been reviewed by another author, but not all authors are reviewers, e.g.

articles <- tibble(author=c("user1", "user2", "user3"), reviewer=c("user2", "user3", "user2"))
> articles
# A tibble: 3 x 2
  author reviewer
  <chr>  <chr>   
1 user1  user2   
2 user2  user3   
3 user3  user2   

to just look at the columns of interest here. So we can see that user2 and user3 have authored articles and are also reviewers. user1 is not a reviewer. How do we actually return the users who have both authored and reviewed articles? In base R you could do:

authors <- unique(articles$author)
reviewers <- unique(articles$reviewer)
> authors[authors %in% reviewers]
[1] "user2" "user3"

but how can we do this in the tidyverse?

Upvotes: 4

Views: 76

Answers (3)

M--
M--

Reputation: 28850

In base we can do:

unique(articles$author[articles$author %in% articles$reviewer])

 # [1] "user2" "user3"

Upvotes: 1

Rui Barradas
Rui Barradas

Reputation: 76402

Here is a solution without joins.

articles %>%
  filter(author %in% reviewer) %>%
  select(author) %>%
  distinct()
## A tibble: 2 x 1
#  author
#  <chr> 
#1 user2 
#2 user3 

Upvotes: 3

doctorG
doctorG

Reputation: 1731

It feels the answer would likely be in dplyr, and I have refined my approach down to:

articles %>% 
  semi_join((.), 
    by=c("author"="reviewer")) %>% 
  select(author) %>%
  distinct
# A tibble: 2 x 1
  author
  <chr> 
1 user2 
2 user3 

This is the correct answer, and I admit I've refined it somewhat even whilst writing this. But has anyone got any alternative takes on the problem? I'm actually quite pleased how it's boiled down, but it took me a little thinking so I'll post it in the hope it helps someone else.

Upvotes: 4

Related Questions