Reputation: 788
a<-list(5,6,8,4,5,2)
b<-c(3,2,1)
I want to sum the "a" according to "b" to form a new list.
that is (5+6+8),(4+5),2
The expected result is:
[[1]]
[1] 19
[[2]]
[1] 9
[[3]]
[1] 2
I use the following code to work out, but I wonder whether there is more convenient way to solve this problem. Thank you!
p<-rep(1:length(b),b)
as.list(sapply(1:length(b), function(x) {sum(as.numeric(a)[which(p==x)])}))
Upvotes: 4
Views: 177
Reputation: 35314
I've thought of an interesting solution to this problem, which is perhaps a little strange, but I like it:
as.list(diff(c(0,cumsum(a)[cumsum(b)])));
## [[1]]
## [1] 19
##
## [[2]]
## [1] 9
##
## [[3]]
## [1] 2
##
First we take a complete cumulative sum with cumsum()
. Note: I originally thought cumsum()
required an atomic vector (like sum()
, for example), and thus I initially had a call to unlist()
prior to cumsum()
, but thanks to @thelatemail for pointing out that it can work with lists as well!
cumsum(a);
## [1] 5 11 19 23 28 30
Then the endpoints of the ranges to be summed are extracted by indexing on cumsum(b)
:
cumsum(b);
## [1] 3 5 6
cumsum(a)[cumsum(b)];
## [1] 19 28 30
We can produce the required summations by taking diff()
with a leading zero:
diff(c(0,cumsum(a)[cumsum(b)]));
## [1] 19 9 2
And since you want the result as a list, we finally need a call to as.list()
:
as.list(diff(c(0,cumsum(a)[cumsum(b)])));
## [[1]]
## [1] 19
##
## [[2]]
## [1] 9
##
## [[3]]
## [1] 2
##
lightsnail <- function() { p<-rep(1:length(b),b); as.list(sapply(1:length(b), function(x) {sum(as.numeric(a)[which(p==x)])})); };
thelatemail <- function() as.list(tapply(unlist(a), rep(seq_along(b), b), sum)); ## added as.list()
psidom <- function() lapply(split(unlist(a), rep(seq_along(b), b)), sum);
tfc <- function() as.list(aggregate(unlist(a), list(rep(1:length(b),b)), sum)[["x"]]);
user20650 <- function() as.list(rowsum(unlist(a), rep(seq_along(b), b), reorder=FALSE));
bgoldst <- function() as.list(diff(c(0,cumsum(a)[cumsum(b)])));
expected <- list(19,9,2);
identical(expected,lightsnail());
## [1] TRUE
identical(expected,unname(thelatemail())); ## ignore names
## [1] TRUE
identical(expected,unname(psidom())); ## ignore names
## [1] TRUE
identical(expected,tfc());
## [1] TRUE
identical(expected,user20650());
## [1] TRUE
identical(expected,bgoldst());
## [1] TRUE
library(microbenchmark);
microbenchmark(lightsnail(),thelatemail(),psidom(),tfc(),user20650(),bgoldst(),times=1e3L);
## Unit: microseconds
## expr min lq mean median uq max neval
## lightsnail() 26.088 33.358 37.34079 37.206 39.344 100.927 1000
## thelatemail() 121.881 135.139 151.77782 142.837 150.963 3547.386 1000
## psidom() 48.753 55.595 61.13800 59.016 63.507 276.693 1000
## tfc() 574.767 613.256 646.64302 628.652 645.757 1923.586 1000
## user20650() 17.534 23.094 25.49522 25.232 26.943 101.782 1000
## bgoldst() 10.264 14.969 17.61914 17.535 18.817 82.965 1000
Upvotes: 8
Reputation: 48201
First, there is no reason to use lists when you are actually working with vectors. Then another two approaches are as follows:
a <- c(5, 6, 8, 4, 5, 2)
b <- c(3, 2, 1)
f <- function(a, b) c(sum(head(a, b[1])), if(length(b) > 1) f(tail(a, -b[1]), b[-1]))
f(a, b)
# [1] 19 9 2
library(Matrix)
(a %*% bdiag(lapply(b, rep, x = 1)))[1, ]
# [1] 19 9 2
The first one is recursive, it keeps shortening a
and b
in every call, whereas the second approach constructs an auxiliary block diagonal matrix.
Upvotes: 2
Reputation: 214927
Another option: lapply(split(unlist(a), rep(seq_along(b), b)), sum)
Upvotes: 3
Reputation: 596
Not sure if it is more convenient, but you can use aggregate with the index you created (p):
as.list(aggregate(unlist(a), list(rep(1:length(b),b)), sum)[["x"]])
Upvotes: 0