Reputation: 4298
I referred to this thread to Create table with all pairs of values from one column in R, counting unique values and Table of Interactions - Case with pets and houses understand how to create 2-way interaction table. How could I do this for all possible cases? Also, I want to find both frequency of occurrence and revenue in these bins (combinations).
Here's my input data
Customer Product Revenue
1 A Rice 10
2 A Sweet Potato 2
3 A Walnut 4
4 B Rice 3
5 B Walnut 2
6 C Walnut 3
7 C Sweet Potato 4
8 D Rice 3
9 E Sweet Potato 4
10 F Walnut 7
11 G Rice 2
12 G Sweet Potato 3
13 H Sweet Potato 4
14 H Walnut 6
15 I Rice 2
DFI <- structure(list(Customer = c("A", "A", "A", "B", "B", "C", "C",
"D", "E", "F", "G", "G", "H", "H", "I"), Product = c("Rice",
"Sweet Potato", "Walnut", "Rice", "Walnut", "Walnut", "Sweet Potato",
"Rice", "Sweet Potato", "Walnut", "Rice", "Sweet Potato", "Sweet Potato",
"Walnut", "Rice"), Revenue = c(10, 2, 4, 3, 2, 3, 4, 3, 4, 7,
2, 3, 4, 6, 2)), .Names = c("Customer", "Product", "Revenue"), row.names = c(NA,
15L), class = "data.frame")
Here's the code to generate all combinations of product Sweet Potato
Rice
and Walnut
:
Combinations<-do.call(c,lapply(seq_along(unique(DFI$Product)),
combn, x = unique(DFI$Product), simplify = FALSE))
[[1]]
[1] "Rice"
[[2]]
[1] "Sweet Potato"
[[3]]
[1] "Walnut"
[[4]]
[1] "Rice" "Sweet Potato"
[[5]]
[1] "Rice" "Walnut"
[[6]]
[1] "Sweet Potato" "Walnut"
[[7]]
[1] "Rice" "Sweet Potato" "Walnut"
Here's my expected output data for frequency of occurrence as per combinations of product types:
Combination Frequency
1 R 2
2 S 1
3 W 1
4 R,S 1
5 S,W 2
6 R,W 1
7 R,S,W 1
DFOUTa <- structure(list(Combination = c("R", "S", "W", "R,S", "S,W", "R,W",
"R,S,W"), Frequency = c(2, 1, 1, 1, 2, 1, 1)), .Names = c("Combination",
"Frequency"), row.names = c(NA, 7L), class = "data.frame")
Here's my expected output data for revenue in the bins (i.e. combinations of product types):
Combination Revenue
1 R 5
2 S 4
3 W 7
4 R,S 5
5 S,W 17
6 R,W 5
7 R,S,W 16
DFOUTb <- structure(list(Combination = c("R", "S", "W", "R,S", "S,W", "R,W",
"R,S,W"), Revenue = c(5, 4, 7, 5, 17, 5, 16)), .Names = c("Combination",
"Revenue"), row.names = c(NA, 7L), class = "data.frame")
I have manually generated above data. I have double-checked to ensure no error.
I am unsure how to generate the two output that I am looking for. I'd sincerely appreciate any help. I'd prefer data.table
based approach because of the size of the data I have in my original dataset.
PS: I have shortened Product Names Rice
, Sweet Potato
and Walnut
to R
, S
, W
respectively in the output file for sake of brevity.
Upvotes: 1
Views: 124
Reputation: 66819
I'd do...
# spin off product table, assign abbreviations
prodDF = DFI[, .(Product = unique(Product))][, prod := substr(Product, 1, 1)]
DFI[prodDF, on=.(Product), prod := i.prod]
# spin off customer table, assign their bundles and revenues
custDF = DFI[order(prod), .(Bundle = toString(prod)), keyby=Customer]
custDF[DFI[, sum(Revenue), by=.(Customer)], rev := i.V1]
# aggregate from customers to bundles
res = custDF[, .(.N, rev = sum(rev)), keyby=Bundle]
# clean up extra columns
DFI[, prod := NULL]
which gives
Bundle N rev
1: R 2 5
2: R, S 1 5
3: R, S, W 1 16
4: R, W 1 5
5: S 1 4
6: S, W 2 17
7: W 1 7
This is very similar to @Mako's answer but...
?GForce
when summing revenue, while Mako's summing of revenue at the customer level does not.These don't really make this answer any better, just different. Despite the GForce thing, my way may actually prove slower, since I group by or merge on customer three times vs that answer's single time. And for the second issue, the other answer is probably a one-liner for simplicity / personal taste.
Upvotes: 1
Reputation: 7292
This should get you both Frequency and Revenue - I'm assuming you want to combine each customer's order into a combination:
require(data.table); setDT(DFI)
DFI[order(Product)
][, .(Combination= paste(Product, collapse=", "), Revenue = sum(Revenue)) , by=.(Customer)
][, .(.N, Revenue= sum(Revenue)), by=.(Combination)]
Combination N Revenue
1: Rice, Sweet Potato, Walnut 1 16
2: Rice, Walnut 1 5
3: Rice 2 5
4: Rice, Sweet Potato 1 5
5: Sweet Potato, Walnut 2 17
6: Sweet Potato 1 4
7: Walnut 1 7
You might find it helpful to look at each chained statement one at a time to see what's happening at each step. The only specific thing I'll mention is that we start with DFI[order(Product)]
to make sure that our generated combinations are consistent, so we don't end up with "Rice, Potato" and "Potato, Rice"
Upvotes: 2