Reputation: 159

Filter only rows that are duplicated using dplyr

I have been trying for a while now to solve a problem close to the one as presented at this issue with no success. This consists in filtering for items that are duplicated in a group, but also considering the original one used for comparison with dplyr (I prefer dplyr over base or data.table).

The solution I tried is as follows:

> a <- data.frame(name=c("a","b","b","b","a","a"),position=c(1,2,1,2,2,2),achieved=c(1,0,0,0,1,0))
> a %>% group_by(name,achieved) %>% mutate(duplicated=duplicated(position))
# A tibble: 6 x 4
# Groups:   name, achieved [3]
  name  position achieved duplicated
  <fct>    <dbl>    <dbl> <lgl>     
1 a            1        1 FALSE     
2 b            2        0 FALSE     
3 b            1        0 FALSE     
4 b            2        0 TRUE      
5 a            2        1 FALSE     
6 a            2        0 FALSE

I know that this solution is close to the one I desire, but it only brings me the values that are duplicated after the first one, but I would also like a dplyr solution that gives me all duplicate values per group, so probably this could help me improve my dplyr understanding.

The desired output would be as follows:

# A tibble: 6 x 4
# Groups:   name, achieved [3]
  name  position achieved duplicated
  <fct>    <dbl>    <dbl> <lgl>     
1 a            1        1 FALSE     
2 b            2        0 TRUE      
3 b            1        0 FALSE     
4 b            2        0 TRUE      
5 a            2        1 FALSE     
6 a            2        0 FALSE

Thanks in advance.

Upvotes: 2

Answers (2)

MrFlick

Reputation: 206232

It seems like you want to group by all of name, position, and acheived and then just see if there are more than one record in that group

a %>% group_by(name,achieved, position) %>% mutate(duplicated = n()>1)

#   name  position achieved duplicated
#  <fct>    <dbl>    <dbl> <lgl>     
# 1 a            1        1 FALSE     
# 2 b            2        0 TRUE      
# 3 b            1        0 FALSE     
# 4 b            2        0 TRUE      
# 5 a            2        1 FALSE     
# 6 a            2        0 FALSE

Upvotes: 3

arg0naut91

Reputation: 14764

Try this:

a %>%
  group_by_all() %>%
  mutate(duplicated = n() > 1)

Upvotes: 2

Filter only rows that are duplicated using dplyr

Answers (2)

Related Questions