davidzxc574
davidzxc574

Reputation: 481

R how to incorporate categories of item set in transactions data

In R I would like to create a transactions data with the following data frame so I can run apriori in package arules. It has transaction IDs, item IDs and category IDs, parents of items.

Transaction_ID  Item_ID Category_ID
T01 A001    A01
T01 A002    A01
T02 A001    A01
T02 A003    A02
T02 A002    A01
T03 A005    A03
T05 A004    A03
T05 A002    A01
T05 A005    A03
T04 A001    A01
T04 A003    A02

I would like to incoporate category IDs as a level above labels (items) into the transactions data as Groceries data.

str(Groceries)
Formal class 'transactions' [package "arules"] with 3 slots
  ..@ data       :Formal class 'ngCMatrix' [package "Matrix"] with 5 slots
  .. .. ..@ i       : int [1:43367] 13 60 69 78 14 29 98 24 15 29 ...
  .. .. ..@ p       : int [1:9836] 0 4 7 8 12 16 21 22 27 28 ...
  .. .. ..@ Dim     : int [1:2] 169 9835
  .. .. ..@ Dimnames:List of 2
  .. .. .. ..$ : NULL
  .. .. .. ..$ : NULL
  .. .. ..@ factors : list()
  ..@ itemInfo   :'data.frame': 169 obs. of  3 variables:
  .. ..$ labels: chr [1:169] "frankfurter" "sausage" "liver loaf" "ham" ...
  .. ..$ level2: Factor w/ 55 levels "baby food","bags",..: 44 44 44 44 44 44 44 42 42 41 ...
  .. ..$ level1: Factor w/ 10 levels "canned food",..: 6 6 6 6 6 6 6 6 6 6 ...
  ..@ itemsetInfo:'data.frame': 0 obs. of  0 variables

However, read.transactions let you import transaction ID and item ID only with parameter cols. I have also tried this

transaction_by_item<-split(df[,c("Item_ID","Category_ID")],df$Transaction_ID)
basket <- as(transaction_by_item, "transactions")

and it gave an error Error in asMethod(object) : can coerce list with atomic components only

It works if I just try to split transactions with item IDs only. transaction_by_item<-split(df$Item_ID,df$Transaction_ID)

Anyone knows how to incorporate both item IDs (labels) and category IDs (level) when creating a transaction data? Thanks.

Upvotes: 1

Views: 478

Answers (1)

s__
s__

Reputation: 9485

Maybe this can help, first of all let's introduce the arules function itemInfo():

library(arules)
itemInfo(Groceries)
head(itemInfo(Groceries))
             labels  level2           level1
1       frankfurter sausage meat and sausage
2           sausage sausage meat and sausage
3        liver loaf sausage meat and sausage
4               ham sausage meat and sausage
5              meat sausage meat and sausage
6 finished products sausage meat and sausage

Now, as you stated, Groceries has a couple of level, in other hands yours:

trans4 <- as(split(dats[,"Item_ID"], dats[,"Transaction_ID"]), "transactions")
str(trans4)
itemInfo(trans4)
  labels
1   A001
2   A002
3   A003
4   A004
5   A005

Now, you've to add it to your data, so you can do this:

library(dplyr)
labels_ <- dats %>% select(Item_ID, Category_ID) %>% distinct()
itemInfo(trans4) <- data.frame(labels = labels_$Item_ID, level1 =labels_$Category_ID)

Now:

itemInfo(trans4)
  labels level1
1   A001    A01
2   A002    A01
3   A003    A02
4   A005    A03
5   A004    A03

And:

str(trans4)
Formal class 'transactions' [package "arules"] with 3 slots
  ..@ data       :Formal class 'ngCMatrix' [package "Matrix"] with 5 slots
  .. .. ..@ i       : int [1:11] 0 1 0 1 2 4 0 2 1 3 ...
  .. .. ..@ p       : int [1:6] 0 2 5 6 8 11
  .. .. ..@ Dim     : int [1:2] 5 5
  .. .. ..@ Dimnames:List of 2
  .. .. .. ..$ : NULL
  .. .. .. ..$ : NULL
  .. .. ..@ factors : list()
  ..@ itemInfo   :'data.frame': 5 obs. of  2 variables:
  .. ..$ labels: Factor w/ 5 levels "A001","A002",..: 1 2 3 5 4
  .. ..$ level1: Factor w/ 3 levels "A01","A02","A03": 1 1 2 3 3    # here we go!!!
  ..@ itemsetInfo:'data.frame': 5 obs. of  1 variable:
  .. ..$ transactionID: chr [1:5] "T01" "T02" "T03" "T04" ...

With data:

dats <- structure(list(Transaction_ID = structure(c(1L, 1L, 2L, 2L, 2L, 
3L, 5L, 5L, 5L, 4L, 4L), .Label = c("T01", "T02", "T03", "T04", 
"T05"), class = "factor"), Item_ID = structure(c(1L, 2L, 1L, 
3L, 2L, 5L, 4L, 2L, 5L, 1L, 3L), .Label = c("A001", "A002", "A003", 
"A004", "A005"), class = "factor"), Category_ID = structure(c(1L, 
1L, 1L, 2L, 1L, 3L, 3L, 1L, 3L, 1L, 2L), .Label = c("A01", "A02", 
"A03"), class = "factor")), class = "data.frame", row.names = c(NA, 
-11L))

Upvotes: 2

Related Questions