Reputation: 375
I am trying to get the cumulative sum of a variable (v) for groups ("a" and "b") within a dataframe. How can I get the result at the bottom -- whose rows are even numbered properly -- into column cs of my dataframe?
> library(nlme)
> g <- factor(c("a","b","a","b","a","b","a","b","a","b","a","b"))
> v <- c(1,4,1,4,1,4,2,8,2,8,2,8)
> cs <- rep(0,12)
> d <- data.frame(g,v,cs)
> d
g v cs
1 a 1 0
2 b 4 0
3 a 1 0
4 b 4 0
5 a 1 0
6 b 4 0
7 a 2 0
8 b 8 0
9 a 2 0
10 b 8 0
11 a 2 0
12 b 8 0
> r=gapply(d,FUN="cumsum",form=~g, which="v")
>r
$a
v
1 1
3 2
5 3
7 5
9 7
11 9
$b
v
2 4
4 8
6 12
8 20
10 28
12 36
> str(r)
List of 2
$ a:'data.frame': 6 obs. of 1 variable:
..$ v: num [1:6] 1 2 3 5 7 9
$ b:'data.frame': 6 obs. of 1 variable:
..$ v: num [1:6] 4 8 12 20 28 36
I guess I could figure out some laborious way to get the data from those dataframes into d$cs, but there's got to be some easy tweak I'm missing.
Upvotes: 16
Views: 14893
Reputation: 389235
Here are few packaged options -
plyr
is retired and replaced by dplyr
library(dplyr)
d %>% mutate(cs = cumsum(v), .by = g)
# g v cs
#1 a 1 1
#2 b 4 4
#3 a 1 2
#4 b 4 8
#5 a 1 3
#6 b 4 12
#7 a 2 5
#8 b 8 20
#9 a 2 7
#10 b 8 28
#11 a 2 9
#12 b 8 36
For larger data, collapse
is super fast and it's syntax is very similar to dplyr
.
library(collapse)
d |> fgroup_by(g) |> fmutate(cs = cumsum(v))
And to do the same thing in data.table
we can do the following
library(data.table)
setDT(d)[, cs := cumsum(v), by = g]
Upvotes: 0
Reputation: 856
> library(nlme)
> g <- factor(c("a","b","a","b","a","b","a","b","a","b","a","b"))
> v <- c(1,4,1,4,1,4,2,8,2,8,2,8)
> cs <- rep(0,12)
> d <- data.frame(g,v,cs)
> d <- d[order(d$g),]
> temp <- by(d$v,d$g,cumsum)
> d$cs <- do.call("c",temp)
> d
g v cs
1 a 1 1
3 a 1 2
5 a 1 3
7 a 2 5
9 a 2 7
11 a 2 9
2 b 4 4
4 b 4 8
6 b 4 12
8 b 8 20
10 b 8 28
12 b 8 36
Another solution using the by function, but I had to order the data first
Upvotes: 0
Reputation: 176718
I would use ave
. If you look at the source of ave
, you'll see it essentially wraps Martin Morgan's solution.
R> g <- factor(c("a","b","a","b","a","b","a","b","a","b","a","b"))
R> v <- c(1,4,1,4,1,4,2,8,2,8,2,8)
R> d <- data.frame(g,v)
R> d$cs <- ave(v, g, FUN=cumsum)
R> d
g v cs
1 a 1 1
2 b 4 4
3 a 1 2
4 b 4 8
5 a 1 3
6 b 4 12
7 a 2 5
8 b 8 20
9 a 2 7
10 b 8 28
11 a 2 9
12 b 8 36
Upvotes: 10
Reputation: 46886
split<-
is a pretty weird beast
split(d$cs, d$g) <- lapply(split(d$v, d$g), cumsum)
leading to
> d
g v cs
1 a 1 1
2 b 4 4
3 a 1 2
4 b 4 8
5 a 1 3
6 b 4 12
7 a 2 5
8 b 8 20
9 a 2 7
10 b 8 28
11 a 2 9
12 b 8 36
Upvotes: 13
Reputation: 173697
My tool of choice for these things is the plyr package:
require(plyr)
> ddply(d,.(g),transform,cs = cumsum(v))
g v cs
1 a 1 1
2 a 1 2
3 a 1 3
4 a 2 5
5 a 2 7
6 a 2 9
7 b 4 4
8 b 4 8
9 b 4 12
10 b 8 20
11 b 8 28
12 b 8 36
Upvotes: 7