Reputation: 723
I need to use group by in levels with ddply or aggregate if that's easier. I am not really sure how to do this as I need to use cumsum as my aggregate function. This is what my data looks like:
level1 level2 hour product
A tea 0 7
A tea 1 2
A tea 2 9
A coffee 17 7
A coffee 18 2
A coffee 20 4
B coffee 0 2
B coffee 1 3
B coffee 2 4
B tea 21 3
B tea 22 1
expected output:
A tea 0 7
A tea 1 9
A tea 2 18
A coffee 17 7
A coffee 18 9
A coffee 20 13
B coffee 0 2
B coffee 1 5
B coffee 2 9
B tea 21 3
B tea 22 4
I tried using
ddply(dd,c("level1","level2","hour"),summarise,cumsum(product))
but that doesn't sum up which I think is because the hour column is being used for group by and its being split by that..I think.. I am not sure I completely understand how aggregate works here. Is there any way I could get the required output using aggregate or ddply?
Upvotes: 13
Views: 5219
Reputation: 140
Folks referencing this now may wish to avoid using plyr::ddply
. Here is a dplyr
solution.
library(dplyr)
output <- dd %>%
group_by(level1, level2) %>%
mutate(product_sum = cumsum(product)) %>%
ungroup()
output
Note that product_sum
is where the new column is named. You could use product
instead. Then, the original product column would be overwritten as it appears the original question may have preferred to do that and not have both columns in the output.
Data:
level1 <- c(rep("A",6), rep("B",5))
level2 <- c(rep("tea",3), rep("coffee",6), rep("tea",2))
hour <- c(0,1,2,17,18,20,0,1,2,21,22)
product <- c(7,2,9,7,2,4,2,3,4,3,1)
dd <- data.frame(level1, level2, hour, product)
Upvotes: 0
Reputation: 193667
Here's a solution in base R using ave
and within
:
within(mydf, {
cumsumProduct <- ave(product, level1, level2, FUN = cumsum)
})
# level1 level2 hour product cumsumProduct
# 1 A tea 0 7 7
# 2 A tea 1 2 9
# 3 A tea 2 9 18
# 4 A coffee 17 7 7
# 5 A coffee 18 2 9
# 6 A coffee 20 4 13
# 7 B coffee 0 2 2
# 8 B coffee 1 3 5
# 9 B coffee 2 4 9
# 10 B tea 21 3 3
# 11 B tea 22 1 4
Of course, if you wanted to drop the existing product column, you can change the command to the following to overwrite the current "product" column:
within(mydf, {
product <- ave(product, level1, level2, FUN = cumsum)
})
Your current approach doesn't work in part because you've included "hour" as one of your grouping variables. In other words, it is seeing the combination of "A + tea + 0" as different from "A + tea + 1", but from your desired output, you seem to simply want the combination of "A + tea" to be the group.
aggregate
won't work as you expect, because it will condense everything into a data.frame
with the same number of rows as the number of unique combinations of "level1" and "level2", in this case, 4 rows. The aggregated column would be a list
. The values would be correct, but it would be less useful.
Here's aggregate
and its output:
> aggregate(product ~ level1 + level2, mydf, cumsum)
level1 level2 product
1 A coffee 7, 9, 13
2 B coffee 2, 5, 9
3 A tea 7, 9, 18
4 B tea 3, 4
Upvotes: 16
Reputation: 118869
you should use transform
instead of summarise
:
# you should probably order your `level2` first
dd$level2 <- factor(dd$level2, levels=c("tea", "coffee"))
# and transform using level1 and level2 alone, not hour
# if you use hour, the groups will be for each row
ddply(dd, .(level1, level2), transform, product=cumsum(product))
# level1 level2 hour product
# 1 A tea 0 7
# 2 A tea 1 9
# 3 A tea 2 18
# 4 A coffee 17 7
# 5 A coffee 18 9
# 6 A coffee 20 13
# 7 B tea 21 3
# 8 B tea 22 4
# 9 B coffee 0 2
# 10 B coffee 1 5
# 11 B coffee 2 9
Upvotes: 7