jknowles
jknowles

Reputation: 479

Sum a list column of matrices in a data.frame by grouping factor

I have a dataframe where one of the columns is a list containing a matrix for each row, defining a transition matrix for that observation.

library(tidyverse)
m <- matrix(1:4, ncol = 2)
d <- data_frame(g = c('a', 'a', 'b', 'b', 'b', 'c'),
                m = rep(list(m), 6))

This looks like:

# A tibble: 6 × 2
      g             m
   <chr>        <list>
1     a <int [2 × 2]>
2     a <int [2 × 2]>
3     b <int [2 × 2]>
4     b <int [2 × 2]>
5     b <int [2 × 2]>
6     c <int [2 × 2]>

I want to get out list of two matrices, a and b that are the sum of all the matrices for each respective grouping factor. I need this method to generalize to an arbitrary number of groups, because I will not know the number of grouping factors in advance.

I have tried by_slice and do, but all I can manage to output is a sum of all matrices, or a sum of either the a or b matrices alone -- not bound in a single group.

Upvotes: 3

Views: 395

Answers (2)

Joel Galang
Joel Galang

Reputation: 76

Another way using group_by, summarise, and reduce:

m_sum <- function(l) {
  reduce(l, `+`) %>% list()
}

group_by(d, g) %>%
  summarise(m_sum = m_sum(m)) %>%
  select(m_sum) %>%
  unlist(recursive = FALSE)

Upvotes: 3

David Robinson
David Robinson

Reputation: 78630

You can do this by nesting the matrices within groups (with tidyr's nest), which creates a list column that contains lists of matrices. You can then use purrr's map and reduce to sum up the matrices within each group's list:

results <- d %>%
  nest(-g) %>%
  mutate(summed = map(data, ~ reduce(.$m, `+`)))

Results:

# A tibble: 3 × 3
      g             data        summed
  <chr>           <list>        <list>
1     a <tibble [2 × 1]> <int [2 × 2]>
2     b <tibble [3 × 1]> <int [2 × 2]>
3     c <tibble [1 × 1]> <int [2 × 2]>

The summed column will have the matrices added up within each group.


If you wanted to turn this into a named list with items a/b/c of matrices, you could do:

lst <- results$summed
names(lst) <- results$g
lst

or alternatively:

results %>%
  select(-data) %>%
  spread(g, summed)

Upvotes: 6

Related Questions