Pavel Shliaha
Pavel Shliaha

Reputation: 935

How to find which groups have identical values in a specific column using dplyr?

I have a table that contains letters and numbers:

xx <- tibble (letter = c (rep ("a", 3), rep ("b", 3), rep ("c", 3)),
              number = c (1, 2, 3, 1, 2, 3, 4, 5, 6))

I would like to first group data by "letter" and then check if there are any two groups that have identical values in number column. These would be groups with letters "a" and "b" in the "letter" column.

The result would look like this

xx <- tibble (letter = c (rep ("a", 3), rep ("b", 3), rep ("c", 3)),
              number = c (1, 2, 3, 1, 2, 3, 4, 5, 6),
              duplicated = c (rep (TRUE, 6), rep (FALSE, 3)) )

is there a way to do this elegantly in dplyr?

Upvotes: 5

Views: 1458

Answers (2)

Kra.P
Kra.P

Reputation: 15143

You may try:

xx %>%
  distinct() %>%
  group_by(number) %>%
  mutate(n = n()) %>%
  mutate(duplicated = ifelse(n>1, TRUE, FALSE)) %>%
  select(-n)

      letter number duplicated
      <chr>   <dbl> <lgl>     
    1 a           1 TRUE      
    2 a           2 TRUE      
    3 a           3 TRUE      
    4 b           1 TRUE      
    5 b           2 TRUE      
    6 b           3 TRUE      
    7 c           4 FALSE     
    8 c           5 FALSE     
    9 c           6 FALSE 

distinct is for if there are some duplicated inside of group,

xx <- tibble (letter = c (rep ("a", 4), rep ("b", 3), rep ("c", 3)),
              number = c (1,1, 2, 3, 1, 2, 3, 4, 4, 6))
   letter number
   <chr>   <dbl>
 1 a           1
 2 a           1
 3 a           2
 4 a           3
 5 b           1
 6 b           2
 7 b           3
 8 c           4
 9 c           4
10 c           6
    xxx <- xx %>%
      distinct() %>%
      group_by(number) %>%
      mutate(n = n()) %>%
      mutate(duplicated = ifelse(n>1, TRUE, FALSE)) %>%
      select(-n)

    xx %>% 
      full_join(xxx, by = c("letter", "number"))

   letter number duplicated
   <chr>   <dbl> <lgl>     
 1 a           1 TRUE      
 2 a           1 TRUE      
 3 a           2 TRUE      
 4 a           3 TRUE      
 5 b           1 TRUE      
 6 b           2 TRUE      
 7 b           3 TRUE      
 8 c           4 FALSE     
 9 c           4 FALSE     
10 c           6 FALSE  

Thanks to @LMc

xxx <- xx %>%
  distinct() %>%
  add_count(number) %>%
  mutate(duplicated = n> 1) %>%
  select(-n)
xxx
xx %>% 
  full_join(xxx, by = c("letter", "number"))

Upvotes: 2

LMc
LMc

Reputation: 18722

You can use cur_group() to subset xx and test for containment:

library(dplyr)

xx %>% 
  group_by(letter) %>% 
  mutate(duplicated = number %in% xx$number[!xx$letter %in% cur_group()]) %>%
  ungroup()
  1. cur_group() represents the value of the current grouping variable (eg a, b, c).
  2. We use the value of cur_group() to subset the vector xx$number for those that are not part of the current group (!xx$letter %in% cur_group()).
  3. Lastly, we check if the current group number is in that subset (number %in% ...).

Output

  letter number duplicated
  <chr>   <dbl> <lgl>     
1 a           1 TRUE      
2 a           2 TRUE      
3 a           3 TRUE      
4 b           1 TRUE      
5 b           2 TRUE      
6 b           3 TRUE      
7 c           4 FALSE     
8 c           5 FALSE     
9 c           6 FALSE     

Upvotes: 2

Related Questions