Reputation: 13
Hey I'm trying to transition from using ddply to using data.table, and I'm close to figuring it out, but I have slight adjustments that I still need to make. Here's the summary of what I'm trying to do, using a toy data set:
Let's say I have sales data for two products for multiple weeks.
x <- structure(list(week = c(1, 1, 2, 3, 1, 2, 2, 3, 4), product = c("a",
"a", "a", "a", "b", "b", "b", "b", "b"), sold = c(10, 15, 20,
25, 30, 35, 40, 45, 50)), .Names = c("week", "product", "sold"
), row.names = c(NA, -9L), class = c("data.table", "data.frame"
), sorted = c("product", "week"))
week product sold
1: 1 a 10
2: 1 a 15
3: 2 a 20
4: 3 a 25
5: 1 b 30
6: 2 b 35
7: 2 b 40
8: 3 b 45
9: 4 b 50
I want to find the total sales for product i for week j, i.e. I sold 25 units of product a for week 1.
I used the following code to do this:
setDT(x)
setkey(x,product,week)
> x1 <- x[x,sum(sold)]
> x1
product week V1
1: a 1 25
2: a 1 25
3: a 2 20
4: a 3 25
5: b 1 30
6: b 2 75
7: b 2 75
8: b 3 45
9: b 4 50
The problem is I'm not sure how to remove the duplicate rows, ie. row 2 is redundant. Additionally, I also want to include NA for weeks where the product didn't sell, ie. a row for product a week 4 with value NA.
I'm sure this is an easy question and I know how to do this in ddply, but I couldn't find what I was looking for through searching. If anyone could help out or link me to the right page if this is a duplicate, that would be great.
Upvotes: 1
Views: 58
Reputation: 49830
Here's how you can cross join by unique products and all weeks, then sum each group.
> x[CJ(unique(product), 1:4), sum(sold), by=.EACHI]
product week V1
1: a 1 25
2: a 2 20
3: a 3 25
4: a 4 NA
5: b 1 30
6: b 2 75
7: b 3 45
8: b 4 50
If you're using data.table version <= 1.9.2
, then just remove the by = .EACHI
part. It's a new design change from the next version on (currently implemented in 1.9.3 devel). Check out the NEWS for more info on this.
Upvotes: 3
Reputation: 54277
Perhaps reshaping is another option:
require(reshape2); require(data.table)
(dt2 <- dcast.data.table(dt, product ~ week, fun.aggregate = sum, value.var = "sold", fill = NA, drop = FALSE))
# product 1 2 3 4
# 1: a 25 20 25 NA
# 2: b 30 75 45 50
(dt3 <- melt(dt2, id.vars = "product", variable.name = "week", value.name = "sold"))
# product week sold
# 1: a 1 25
# 2: b 1 30
# 3: a 2 20
# 4: b 2 75
# 5: a 3 25
# 6: b 3 45
# 7: a 4 NA
# 8: b 4 50
Upvotes: 0