Direct way for 'distinct() groupwise'

Question

I would like to make the same like distinct() but for groups. Here is an example:

data <- data.frame(
  group = c(1, 1, 2, 3, 3, 4, 4, 5, 5),
  procedure = c("A", "B", "A", "A", "B", "A", "X", "A", "X")
)

  group procedure
1     1         A
2     1         B
3     2         A
4     3         A
5     3         B
6     4         A
7     4         X
8     5         A
9     5         X

I am expecting this:

Note: group_id is just an interim and not important:

 group procedure group_id
                 
1     1 A                      2
2     1 B                      2
3     2 A                      1
4     4 A                      3
5     4 X                      3

I use this working code:

library(dplyr)
library(tidyr)

data %>%
  summarise(procedure = toString(sort(procedure)), .by = group) %>%
  mutate(group_id = as.integer(factor(procedure))) %>% 
  distinct(group_id, .keep_all = TRUE) %>% 
  separate_rows(procedure)

Is there a more direct method available? For context, my dataset contains 23,000 rows with numerous groups, and I need to identify and evaluate the main member of each group. Therefore, I'm looking for a way to efficiently distinguish and assess all unique groups. Could you suggest an approach to facilitate this evaluation?

ThomasIsCoding · Accepted Answer

I don't know if the code is short enough for you

data %>%
    summarise(procedure = list(sort(procedure)), .by = group) %>%
    filter(!duplicated(procedure)) %>%
    unnest(procedure)

which gives

# A tibble: 5 × 2
  group procedure
   
1     1 A
2     1 B
3     2 A
4     4 A
5     4 X

Direct way for 'distinct() groupwise'

Answers (2)

Related Questions

Direct way for &#39;distinct() groupwise&#39;

Answers (2)

Related Questions

Direct way for 'distinct() groupwise'