BHudson
BHudson

Reputation: 707

Create combinations by group and sum

I have data of names within an ID number along with a number of associated values. It looks something like this:

structure(list(id = c("a", "a", "b", "b"), name = c("bob", "jane", 
"mark", "brittney"), number = c(1L, 2L, 1L, 2L), value = c(1L, 
2L, 1L, 2L)), class = "data.frame", row.names = c(NA, -4L))

#   id     name number value
# 1  a      bob      1     1
# 2  a     jane      2     2
# 3  b     mark      1     1
# 4  b brittney      2     2

I would like to create all the combinations of name, regardless of how many there are, and paste them together separated with commas, and sum their number and value within each id. The desired output from the example above is then:

structure(list(id = c("a", "a", "a", "b", "b", "b"), name = c("bob", 
"jane", "bob, jane", "mark", "brittney", "mark, brittney"), number = c(1L, 
2L, 3L, 1L, 2L, 3L), value = c(1L, 2L, 3L, 1L, 2L, 3L)), class = "data.frame", row.names = c(NA, -6L))

#   id           name number value
# 1  a            bob      1     1
# 2  a           jane      2     2
# 3  a      bob, jane      3     3
# 4  b           mark      1     1
# 5  b       brittney      2     2
# 6  b mark, brittney      3     3

Thanks all!

Upvotes: 5

Views: 456

Answers (3)

ThomasIsCoding
ThomasIsCoding

Reputation: 101337

A data.table option

setDT(df)[
  ,
  lapply(
    .SD,
    function(x) {
      unlist(
        lapply(
          seq_along(x),
          combn,
          x = x,
          function(v) {
            ifelse(all(is.character(v)), toString, sum)(v)
          }
        )
      )
    }
  ),
  id
]

gives

   id           name number value
1:  a            bob      1     1
2:  a           jane      2     2
3:  a      bob, jane      3     3
4:  b           mark      1     1
5:  b       brittney      2     2
6:  b mark, brittney      3     3

Upvotes: 2

lroha
lroha

Reputation: 34441

You can create pairwise indices using combn() and expand the data frame with these using slice(). Then just group by these row pairs and summarise. I'm assuming you want pairwise combinations but this can be adapted for larger sets if needed. Some code to handle groups < 2 is included but can be removed if these don't exist in your data.

library(dplyr)
library(purrr)

df1 %>%
  group_by(id) %>%
  slice(c(combn(seq(n()), min(n(), 2)))) %>%
  mutate(id2 = (row_number()-1) %/% 2) %>%
  group_by(id, id2) %>%
  summarise(name = toString(name),
            across(where(is.numeric), sum), .groups = "drop") %>%
  select(-id2) %>%
  bind_rows(df1 %>%
              group_by(id) %>%
              filter(n() > 1), .) %>%
  arrange(id) %>%
  ungroup()

# A tibble: 6 × 4
  id    name           number value
  <chr> <chr>           <int> <int>
1 a     bob                 1     1
2 a     jane                2     2
3 a     bob, jane           3     3
4 b     mark                1     1
5 b     brittney            2     2
6 b     mark, brittney      3     3

Edit:

To adapt for all possible combinations you can iterate over the values up to the max group size. Using edited data which has a couple of rows added to the first group:

map_df(seq(max(table(df2$id))), ~
         df2 %>%
         group_by(id) %>%
         slice(c(combn(seq(n()), .x * (.x <= n())))) %>%
         mutate(id2 = (row_number() - 1) %/% .x) %>%
         group_by(id, id2) %>%
         summarise(name = toString(name),
                   across(where(is.numeric), sum), .groups = "drop")
       ) %>%
  select(-id2) %>%
  arrange(id)

# A tibble: 18 × 4
   id    name                      number value
   <chr> <chr>                      <int> <int>
 1 a     bob                            1     1
 2 a     jane                           2     2
 3 a     sophie                         1     1
 4 a     jeremy                         2     2
 5 a     bob, jane                      3     3
 6 a     bob, sophie                    2     2
 7 a     bob, jeremy                    3     3
 8 a     jane, sophie                   3     3
 9 a     jane, jeremy                   4     4
10 a     sophie, jeremy                 3     3
11 a     bob, jane, sophie              4     4
12 a     bob, jane, jeremy              5     5
13 a     bob, sophie, jeremy            4     4
14 a     jane, sophie, jeremy           5     5
15 a     bob, jane, sophie, jeremy      6     6
16 b     mark                           3     5
17 b     brittney                       4     6
18 b     mark, brittney                 7    11

Data for df2:

df2 <- structure(list(id = c("a", "a", "a", "a", "b", "b"), name = c("bob", 
                                                                     "jane", "sophie", "jeremy", "mark", "brittney"), number = c(1L, 
                                                                                                                                 2L, 1L, 2L, 3L, 4L), value = c(1L, 2L, 1L, 2L, 5L, 6L)), class = "data.frame", row.names = c(NA, 
                                                                                                                                                                                                                              -6L))

Upvotes: 4

Darren Tsai
Darren Tsai

Reputation: 35554

You could use group_modify() + add_row():

library(dplyr)

df %>%
  group_by(id) %>%
  group_modify( ~ .x %>%
    summarise(name = toString(name), across(c(number, value), sum)) %>%
    add_row(.x, .)
  ) %>%
  ungroup()

# # A tibble: 6 × 4
#   id    name           number value
#   <chr> <chr>           <int> <int>
# 1 a     bob                 1     1
# 2 a     jane                2     2
# 3 a     bob, jane           3     3
# 4 b     mark                1     1
# 5 b     brittney            2     2
# 6 b     mark, brittney      3     3

Upvotes: 5

Related Questions