Reputation: 557
I have a large tibble with one nested list column. Each element of the nested list column has 10,000 iterations and i would like to apply a cumulative sum across these iterations by a grouping variable.
I have created a minimal reproducible example below
tibble(a = list(c(1,2),c(3,4), c(5,6), c(7,8)),
c = c(1,1, 2, 2))
The intended output should be
tibble(a = list(c(1,2),c(4,6), c(5,6), c(12,14)),
c = c(1,1, 2, 2))
Tried the follwoing syntax but its clearly wrong
x <- tibble(a = list(c(1,2),c(4,6), c(5,6), c(7,8)),
c = c(1,1, 2, 2))
x %>%
group_by(c) %>%
mutate(a = map(a,cumsum))
Any help greatly appreciated. I can potentially spread the data and add across the columns but that would be slow
Upvotes: 1
Views: 369
Reputation: 40121
One base R
option could be:
with(tbl, ave(a, c, FUN = function(x) Reduce(`+`, x, accumulate = TRUE)))
[[1]]
[1] 1 2
[[2]]
[1] 4 6
[[3]]
[1] 5 6
[[4]]
[1] 12 14
Upvotes: 2
Reputation: 34601
I think you're looking for the following though it doesn't match your desired output for the last two values (can you check that these are correct):
library(dplyr)
library(purrr)
library(tidyr)
df %>%
group_by(c) %>%
mutate(x = accumulate(a, `+`)) %>%
unnest(cols = c(a, x))
# A tibble: 8 x 3
# Groups: c [2]
a c x
<dbl> <dbl> <dbl>
1 1 1 1
2 2 1 2
3 3 1 4
4 4 1 6
5 5 2 5
6 6 2 6
7 7 2 12
8 8 2 14
Upvotes: 0