Reputation: 935

How to find which groups have identical values in a specific column using dplyr?

I have a table that contains letters and numbers:

xx <- tibble (letter = c (rep ("a", 3), rep ("b", 3), rep ("c", 3)),
              number = c (1, 2, 3, 1, 2, 3, 4, 5, 6))

I would like to first group data by "letter" and then check if there are any two groups that have identical values in number column. These would be groups with letters "a" and "b" in the "letter" column.

The result would look like this

xx <- tibble (letter = c (rep ("a", 3), rep ("b", 3), rep ("c", 3)),
              number = c (1, 2, 3, 1, 2, 3, 4, 5, 6),
              duplicated = c (rep (TRUE, 6), rep (FALSE, 3)) )

is there a way to do this elegantly in dplyr?

Upvotes: 5

Answers (2)

Kra.P

Reputation: 15153

You may try:

xx %>%
  distinct() %>%
  group_by(number) %>%
  mutate(n = n()) %>%
  mutate(duplicated = ifelse(n>1, TRUE, FALSE)) %>%
  select(-n)

      letter number duplicated
      <chr>   <dbl> <lgl>     
    1 a           1 TRUE      
    2 a           2 TRUE      
    3 a           3 TRUE      
    4 b           1 TRUE      
    5 b           2 TRUE      
    6 b           3 TRUE      
    7 c           4 FALSE     
    8 c           5 FALSE     
    9 c           6 FALSE

distinct is for if there are some duplicated inside of group,

xx <- tibble (letter = c (rep ("a", 4), rep ("b", 3), rep ("c", 3)),
              number = c (1,1, 2, 3, 1, 2, 3, 4, 4, 6))
   letter number
   <chr>   <dbl>
 1 a           1
 2 a           1
 3 a           2
 4 a           3
 5 b           1
 6 b           2
 7 b           3
 8 c           4
 9 c           4
10 c           6
    xxx <- xx %>%
      distinct() %>%
      group_by(number) %>%
      mutate(n = n()) %>%
      mutate(duplicated = ifelse(n>1, TRUE, FALSE)) %>%
      select(-n)

    xx %>% 
      full_join(xxx, by = c("letter", "number"))

   letter number duplicated
   <chr>   <dbl> <lgl>     
 1 a           1 TRUE      
 2 a           1 TRUE      
 3 a           2 TRUE      
 4 a           3 TRUE      
 5 b           1 TRUE      
 6 b           2 TRUE      
 7 b           3 TRUE      
 8 c           4 FALSE     
 9 c           4 FALSE     
10 c           6 FALSE

Thanks to @LMc

xxx <- xx %>%
  distinct() %>%
  add_count(number) %>%
  mutate(duplicated = n> 1) %>%
  select(-n)
xxx
xx %>% 
  full_join(xxx, by = c("letter", "number"))

Upvotes: 2

LMc

Reputation: 18732

You can use cur_group() to subset xx and test for containment:

library(dplyr)

xx %>% 
  group_by(letter) %>% 
  mutate(duplicated = number %in% xx$number[!xx$letter %in% cur_group()]) %>%
  ungroup()

cur_group() represents the value of the current grouping variable (eg a, b, c).
We use the value of cur_group() to subset the vector xx$number for those that are not part of the current group (!xx$letter %in% cur_group()).
Lastly, we check if the current group number is in that subset (number %in% ...).

Output

  letter number duplicated
  <chr>   <dbl> <lgl>     
1 a           1 TRUE      
2 a           2 TRUE      
3 a           3 TRUE      
4 b           1 TRUE      
5 b           2 TRUE      
6 b           3 TRUE      
7 c           4 FALSE     
8 c           5 FALSE     
9 c           6 FALSE

Upvotes: 2

How to find which groups have identical values in a specific column using dplyr?

Answers (2)

Thanks to @LMc

Related Questions