James
James

Reputation: 67

Matrix summation within groups in R dataframes

I have a dataframe which contains a groupID column and a column of matrices. I want to calculate the sum of the matrices in each group (matrix addition rather than the sum of the elements of the matrices).

I realise that that's fairly poorly explained - here's an example.

library(tidyverse)
mydf <- data.frame(groupID= sample(c("A", "B", "C", "D"), 20, replace = T)) %>% 
    mutate(mat = lapply(1:20, function(x) matrix(runif(9, 0, 10), nrow=3)))

Each observation has a groupID (A, B, C or D) and a 3x3 matrix of real numbers. I want to calculate the sum of all matrices in each group - ie 4 matrices, with dim 3x3.

If mat was just a vector of scalar values, it would just be a straightforward case of group_by(groupID) %>% summarise(sum(mat)). But since mat is technically a list of matrices, I get the following error

Error in summarise_impl(.data, dots) : Evaluation error: invalid 'type' (list) of argument.

Although I imagine even if that did work, it would give me the sum of all the elements.

I've tried Reduce as well, since it works on an ungrouped list of matrices:

mydf %>% group_by(groupID) %>% summarise(Reduce('+', mat))
Error in summarise_impl(.data, dots) : `Reduce("+", mat)` must be length 1 (a summary value), not 9

Basically, I'm getting the impression that summarise only wants to output a single value for each group rather than a matrix.

Right now, the only solution I can think of is to loop through each unique value of groupID, filter the dataframe and sum what's left. But this isn't very elegant given my actual dataset has ~3000 different groups.

Any bright ideas much appreciated.

Thanks,

James

Upvotes: 2

Views: 249

Answers (1)

akrun
akrun

Reputation: 887881

After grouping by 'groupID', we can use reduce within summarise

library(tidyverse)
res <- mydf %>% 
         group_by(groupID) %>%
         summarise(mat = list(reduce(mat, `+`))) 

A base R option would be to split by 'groupID' and then use Reduce by looping over the split elements

res2 <-  lapply(split(mydf, mydf$groupID), function(x) Reduce('+', x$mat))
identical(res$mat, unname(res2))
#[1] TRUE

data

set.seed(24)
mydf <- data.frame(groupID= sample(c("A", "B", "C", "D"), 20, replace = T)) %>% 
               mutate(mat = lapply(1:20, function(x) matrix(runif(9, 0, 10), nrow=3)))

Upvotes: 1

Related Questions