Create a table with all possible interactions (2-way and 3-way)

Question

I referred to this thread to Create table with all pairs of values from one column in R, counting unique values and Table of Interactions - Case with pets and houses understand how to create 2-way interaction table. How could I do this for all possible cases? Also, I want to find both frequency of occurrence and revenue in these bins (combinations).

Here's my input data

   Customer      Product Revenue
1         A         Rice      10
2         A Sweet Potato       2
3         A       Walnut       4
4         B         Rice       3
5         B       Walnut       2
6         C       Walnut       3
7         C Sweet Potato       4
8         D         Rice       3
9         E Sweet Potato       4
10        F       Walnut       7
11        G         Rice       2
12        G Sweet Potato       3
13        H Sweet Potato       4
14        H       Walnut       6
15        I         Rice       2

DFI <- structure(list(Customer = c("A", "A", "A", "B", "B", "C", "C", 
"D", "E", "F", "G", "G", "H", "H", "I"), Product = c("Rice", 
"Sweet Potato", "Walnut", "Rice", "Walnut", "Walnut", "Sweet Potato", 
"Rice", "Sweet Potato", "Walnut", "Rice", "Sweet Potato", "Sweet Potato", 
"Walnut", "Rice"), Revenue = c(10, 2, 4, 3, 2, 3, 4, 3, 4, 7, 
2, 3, 4, 6, 2)), .Names = c("Customer", "Product", "Revenue"), row.names = c(NA, 
15L), class = "data.frame")

Here's the code to generate all combinations of product Sweet Potato Rice and Walnut:

Combinations<-do.call(c,lapply(seq_along(unique(DFI$Product)), 
  combn, x = unique(DFI$Product), simplify = FALSE))

[[1]]
[1] "Rice"

[[2]]
[1] "Sweet Potato"

[[3]]
[1] "Walnut"

[[4]]
[1] "Rice"         "Sweet Potato"

[[5]]
[1] "Rice"   "Walnut"

[[6]]
[1] "Sweet Potato" "Walnut"      

[[7]]
[1] "Rice"         "Sweet Potato" "Walnut"

Here's my expected output data for frequency of occurrence as per combinations of product types:

  Combination Frequency
1           R         2
2           S         1
3           W         1
4         R,S         1
5         S,W         2
6         R,W         1
7       R,S,W         1

DFOUTa <- structure(list(Combination = c("R", "S", "W", "R,S", "S,W", "R,W", 
"R,S,W"), Frequency = c(2, 1, 1, 1, 2, 1, 1)), .Names = c("Combination", 
"Frequency"), row.names = c(NA, 7L), class = "data.frame")

Here's my expected output data for revenue in the bins (i.e. combinations of product types):

  Combination Revenue
1           R       5
2           S       4
3           W       7
4         R,S       5
5         S,W      17
6         R,W       5
7       R,S,W      16

DFOUTb <- structure(list(Combination = c("R", "S", "W", "R,S", "S,W", "R,W", 
"R,S,W"), Revenue = c(5, 4, 7, 5, 17, 5, 16)), .Names = c("Combination", 
"Revenue"), row.names = c(NA, 7L), class = "data.frame")

I have manually generated above data. I have double-checked to ensure no error.

I am unsure how to generate the two output that I am looking for. I'd sincerely appreciate any help. I'd prefer data.table based approach because of the size of the data I have in my original dataset.

PS: I have shortened Product Names Rice, Sweet Potato and Walnut to R, S, W respectively in the output file for sake of brevity.

Mako212 · Accepted Answer

This should get you both Frequency and Revenue - I'm assuming you want to combine each customer's order into a combination:

require(data.table); setDT(DFI)

DFI[order(Product)
  ][, .(Combination= paste(Product, collapse=", "), Revenue = sum(Revenue)) , by=.(Customer)
  ][, .(.N, Revenue= sum(Revenue)), by=.(Combination)]

                  Combination N Revenue
1: Rice, Sweet Potato, Walnut 1      16
2:               Rice, Walnut 1       5
3:                       Rice 2       5
4:         Rice, Sweet Potato 1       5
5:       Sweet Potato, Walnut 2      17
6:               Sweet Potato 1       4
7:                     Walnut 1       7

You might find it helpful to look at each chained statement one at a time to see what's happening at each step. The only specific thing I'll mention is that we start with DFI[order(Product)] to make sure that our generated combinations are consistent, so we don't end up with "Rice, Potato" and "Potato, Rice"

Create a table with all possible interactions (2-way and 3-way)

Answers (2)

Related Questions