Reputation: 7248
I want to append a column to a dataframe that has the result of a cumulative function. I can accomplish this with unsplit
/split
, like this
> set.seed(3)
> d <- data.frame(type=sample(c('a','b'),10,replace=TRUE), val=rnorm(10))
> d
type val
1 a 0.03012394
2 b 0.08541773
3 a 1.11661021
4 a -1.21885742
5 b 1.26736872
6 b -0.74478160
7 a -1.13121857
8 a -0.71635849
9 b 0.25265237
10 b 0.15204571
So I use split
/lapply
/unsplit
to get my desired result
> d$sum <- unsplit(lapply(split(d,d$type), function(x) { cumsum(x$val)}), d$type)
> d
type val sum
1 a 0.03012394 0.03012394
2 b 0.08541773 0.08541773
3 a 1.11661021 1.14673416
4 a -1.21885742 -0.07212326
5 b 1.26736872 1.35278645
6 b -0.74478160 0.60800486
7 a -1.13121857 -1.20334183
8 a -0.71635849 -1.91970032
9 b 0.25265237 0.86065723
10 b 0.15204571 1.01270293
And this is the desired result. But I'd really like to use the simplified syntax of plyr
in this case. So I tried
> d$sum2 <- unsplit(dlply(d, .(type), summarise, cumsum(val)), d$type)
Error in `row.names<-.data.frame`(`*tmp*`, value = value) :
duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique values when setting 'row.names': '1', '2', '3', '4', '5'
The output of dlply
and the lapply
/split
are almost the same, except that the dlply
has some extra junk that I think unsplit
will ignore, and the dlply
output has re-indexed the row.names. I think this latter is what the complaint is.
Also to note that I am aware that I can approach this with ddply
/transform
> ddply(d, .(type), transform, sum2=cumsum(val))
type val sum sum2
1 a 0.03012394 0.03012394 0.03012394
2 a 1.11661021 1.14673416 1.14673416
3 a -1.21885742 -0.07212326 -0.07212326
4 a -1.13121857 -1.20334183 -1.20334183
5 a -0.71635849 -1.91970032 -1.91970032
6 b 0.08541773 0.08541773 0.08541773
7 b 1.26736872 1.35278645 1.35278645
8 b -0.74478160 0.60800486 0.60800486
9 b 0.25265237 0.86065723 0.86065723
10 b 0.15204571 1.01270293 1.01270293
This won't work in my case, because as you can see, this has the side effect of rearranging the rows to be out of order. If there's some argument to ddply
that would not rearrange the rows, then this would be perfect for my purposes.
Upvotes: 1
Views: 288
Reputation: 263372
Why not use use ave
?
d$sum <- # absolutely terrible name for a variable
ave( d$val, d$type, FUN=cumsum)
The lapply( split(d, d$type) , func)
-approach is overkill for a function that will only operate on one vector at a time.
Upvotes: 1
Reputation: 67778
Perhaps you could try dplyr
instead? In contrast to ddply
, it keeps the original order.
library(dplyr)
d %.%
group_by(type) %.%
mutate(sum = cumsum(val))
# Source: local data frame [10 x 3]
# Groups: type
#
# type val sum
# 1 a 0.03012394 0.03012394
# 2 b 0.08541773 0.08541773
# 3 a 1.11661021 1.14673416
# 4 a -1.21885742 -0.07212326
# 5 b 1.26736872 1.35278645
# 6 b -0.74478160 0.60800486
# 7 a -1.13121857 -1.20334183
# 8 a -0.71635849 -1.91970032
# 9 b 0.25265237 0.86065723
# 10 b 0.15204571 1.01270293
Upvotes: 3