Sum a list column of matrices in a data.frame by grouping factor

Question

I have a dataframe where one of the columns is a list containing a matrix for each row, defining a transition matrix for that observation.

library(tidyverse)
m <- matrix(1:4, ncol = 2)
d <- data_frame(g = c('a', 'a', 'b', 'b', 'b', 'c'),
                m = rep(list(m), 6))

This looks like:

# A tibble: 6 × 2
      g             m
           
1     a 
2     a 
3     b 
4     b 
5     b 
6     c

I want to get out list of two matrices, a and b that are the sum of all the matrices for each respective grouping factor. I need this method to generalize to an arbitrary number of groups, because I will not know the number of grouping factors in advance.

I have tried by_slice and do, but all I can manage to output is a sum of all matrices, or a sum of either the a or b matrices alone -- not bound in a single group.

David Robinson · Accepted Answer

You can do this by nesting the matrices within groups (with tidyr's nest), which creates a list column that contains lists of matrices. You can then use purrr's map and reduce to sum up the matrices within each group's list:

results <- d %>%
  nest(-g) %>%
  mutate(summed = map(data, ~ reduce(.$m, `+`)))

Results:

# A tibble: 3 × 3
      g             data        summed
                     
1     a  
2     b  
3     c

The summed column will have the matrices added up within each group.

If you wanted to turn this into a named list with items a/b/c of matrices, you could do:

lst <- results$summed
names(lst) <- results$g
lst

or alternatively:

results %>%
  select(-data) %>%
  spread(g, summed)

Sum a list column of matrices in a data.frame by grouping factor

Answers (2)

Related Questions