watchtower
watchtower

Reputation: 4298

Create a table with all possible interactions (2-way and 3-way)

I referred to this thread to Create table with all pairs of values from one column in R, counting unique values and Table of Interactions - Case with pets and houses understand how to create 2-way interaction table. How could I do this for all possible cases? Also, I want to find both frequency of occurrence and revenue in these bins (combinations).

Here's my input data

   Customer      Product Revenue
1         A         Rice      10
2         A Sweet Potato       2
3         A       Walnut       4
4         B         Rice       3
5         B       Walnut       2
6         C       Walnut       3
7         C Sweet Potato       4
8         D         Rice       3
9         E Sweet Potato       4
10        F       Walnut       7
11        G         Rice       2
12        G Sweet Potato       3
13        H Sweet Potato       4
14        H       Walnut       6
15        I         Rice       2

DFI <- structure(list(Customer = c("A", "A", "A", "B", "B", "C", "C", 
"D", "E", "F", "G", "G", "H", "H", "I"), Product = c("Rice", 
"Sweet Potato", "Walnut", "Rice", "Walnut", "Walnut", "Sweet Potato", 
"Rice", "Sweet Potato", "Walnut", "Rice", "Sweet Potato", "Sweet Potato", 
"Walnut", "Rice"), Revenue = c(10, 2, 4, 3, 2, 3, 4, 3, 4, 7, 
2, 3, 4, 6, 2)), .Names = c("Customer", "Product", "Revenue"), row.names = c(NA, 
15L), class = "data.frame")

Here's the code to generate all combinations of product Sweet Potato Rice and Walnut:

Combinations<-do.call(c,lapply(seq_along(unique(DFI$Product)), 
  combn, x = unique(DFI$Product), simplify = FALSE))

[[1]]
[1] "Rice"

[[2]]
[1] "Sweet Potato"

[[3]]
[1] "Walnut"

[[4]]
[1] "Rice"         "Sweet Potato"

[[5]]
[1] "Rice"   "Walnut"

[[6]]
[1] "Sweet Potato" "Walnut"      

[[7]]
[1] "Rice"         "Sweet Potato" "Walnut"      

Here's my expected output data for frequency of occurrence as per combinations of product types:

  Combination Frequency
1           R         2
2           S         1
3           W         1
4         R,S         1
5         S,W         2
6         R,W         1
7       R,S,W         1

DFOUTa <- structure(list(Combination = c("R", "S", "W", "R,S", "S,W", "R,W", 
"R,S,W"), Frequency = c(2, 1, 1, 1, 2, 1, 1)), .Names = c("Combination", 
"Frequency"), row.names = c(NA, 7L), class = "data.frame")

Here's my expected output data for revenue in the bins (i.e. combinations of product types):

  Combination Revenue
1           R       5
2           S       4
3           W       7
4         R,S       5
5         S,W      17
6         R,W       5
7       R,S,W      16

DFOUTb <- structure(list(Combination = c("R", "S", "W", "R,S", "S,W", "R,W", 
"R,S,W"), Revenue = c(5, 4, 7, 5, 17, 5, 16)), .Names = c("Combination", 
"Revenue"), row.names = c(NA, 7L), class = "data.frame")

I have manually generated above data. I have double-checked to ensure no error.

I am unsure how to generate the two output that I am looking for. I'd sincerely appreciate any help. I'd prefer data.table based approach because of the size of the data I have in my original dataset.


PS: I have shortened Product Names Rice, Sweet Potato and Walnut to R, S, W respectively in the output file for sake of brevity.

Upvotes: 1

Views: 124

Answers (2)

Frank
Frank

Reputation: 66819

I'd do...

# spin off product table, assign abbreviations
prodDF = DFI[, .(Product = unique(Product))][, prod := substr(Product, 1, 1)]
DFI[prodDF, on=.(Product), prod := i.prod]

# spin off customer table, assign their bundles and revenues
custDF = DFI[order(prod), .(Bundle = toString(prod)), keyby=Customer]    
custDF[DFI[, sum(Revenue), by=.(Customer)], rev := i.V1]

# aggregate from customers to bundles
res = custDF[, .(.N, rev = sum(rev)), keyby=Bundle]

# clean up extra columns
DFI[, prod := NULL]

which gives

    Bundle N rev
1:       R 2   5
2:    R, S 1   5
3: R, S, W 1  16
4:    R, W 1   5
5:       S 1   4
6:    S, W 2  17
7:       W 1   7

This is very similar to @Mako's answer but...

  1. Both of my aggregations use ?GForce when summing revenue, while Mako's summing of revenue at the customer level does not.
  2. This way leaves behind the customer table that you can inspect or merge with other customer attributes (if any); and ditto for the product table.

These don't really make this answer any better, just different. Despite the GForce thing, my way may actually prove slower, since I group by or merge on customer three times vs that answer's single time. And for the second issue, the other answer is probably a one-liner for simplicity / personal taste.

Upvotes: 1

Mako212
Mako212

Reputation: 7292

This should get you both Frequency and Revenue - I'm assuming you want to combine each customer's order into a combination:

require(data.table); setDT(DFI)

DFI[order(Product)
  ][, .(Combination= paste(Product, collapse=", "), Revenue = sum(Revenue)) , by=.(Customer)
  ][, .(.N, Revenue= sum(Revenue)), by=.(Combination)]

                  Combination N Revenue
1: Rice, Sweet Potato, Walnut 1      16
2:               Rice, Walnut 1       5
3:                       Rice 2       5
4:         Rice, Sweet Potato 1       5
5:       Sweet Potato, Walnut 2      17
6:               Sweet Potato 1       4
7:                     Walnut 1       7

You might find it helpful to look at each chained statement one at a time to see what's happening at each step. The only specific thing I'll mention is that we start with DFI[order(Product)] to make sure that our generated combinations are consistent, so we don't end up with "Rice, Potato" and "Potato, Rice"

Upvotes: 2

Related Questions