Reputation: 67
I have a dataframe which contains a groupID column and a column of matrices. I want to calculate the sum of the matrices in each group (matrix addition rather than the sum of the elements of the matrices).
I realise that that's fairly poorly explained - here's an example.
library(tidyverse)
mydf <- data.frame(groupID= sample(c("A", "B", "C", "D"), 20, replace = T)) %>%
mutate(mat = lapply(1:20, function(x) matrix(runif(9, 0, 10), nrow=3)))
Each observation has a groupID (A, B, C or D) and a 3x3 matrix of real numbers. I want to calculate the sum of all matrices in each group - ie 4 matrices, with dim 3x3.
If mat
was just a vector of scalar values, it would just be a straightforward case of group_by(groupID) %>% summarise(sum(mat))
. But since mat
is technically a list of matrices, I get the following error
Error in summarise_impl(.data, dots) : Evaluation error: invalid 'type' (list) of argument.
Although I imagine even if that did work, it would give me the sum of all the elements.
I've tried Reduce
as well, since it works on an ungrouped list of matrices:
mydf %>% group_by(groupID) %>% summarise(Reduce('+', mat))
Error in summarise_impl(.data, dots) : `Reduce("+", mat)` must be length 1 (a summary value), not 9
Basically, I'm getting the impression that summarise
only wants to output a single value for each group rather than a matrix.
Right now, the only solution I can think of is to loop through each unique value of groupID
, filter the dataframe and sum what's left. But this isn't very elegant given my actual dataset has ~3000 different groups.
Any bright ideas much appreciated.
Thanks,
James
Upvotes: 2
Views: 249
Reputation: 887881
After grouping by 'groupID', we can use reduce
within summarise
library(tidyverse)
res <- mydf %>%
group_by(groupID) %>%
summarise(mat = list(reduce(mat, `+`)))
A base R
option would be to split
by 'groupID' and then use Reduce
by looping over the split elements
res2 <- lapply(split(mydf, mydf$groupID), function(x) Reduce('+', x$mat))
identical(res$mat, unname(res2))
#[1] TRUE
set.seed(24)
mydf <- data.frame(groupID= sample(c("A", "B", "C", "D"), 20, replace = T)) %>%
mutate(mat = lapply(1:20, function(x) matrix(runif(9, 0, 10), nrow=3)))
Upvotes: 1