Reputation: 481
In R I would like to create a transactions data with the following data frame so I can run apriori
in package arules
. It has transaction IDs, item IDs and category IDs, parents of items.
Transaction_ID Item_ID Category_ID
T01 A001 A01
T01 A002 A01
T02 A001 A01
T02 A003 A02
T02 A002 A01
T03 A005 A03
T05 A004 A03
T05 A002 A01
T05 A005 A03
T04 A001 A01
T04 A003 A02
I would like to incoporate category IDs as a level above labels (items) into the transactions data as Groceries
data.
str(Groceries)
Formal class 'transactions' [package "arules"] with 3 slots
..@ data :Formal class 'ngCMatrix' [package "Matrix"] with 5 slots
.. .. ..@ i : int [1:43367] 13 60 69 78 14 29 98 24 15 29 ...
.. .. ..@ p : int [1:9836] 0 4 7 8 12 16 21 22 27 28 ...
.. .. ..@ Dim : int [1:2] 169 9835
.. .. ..@ Dimnames:List of 2
.. .. .. ..$ : NULL
.. .. .. ..$ : NULL
.. .. ..@ factors : list()
..@ itemInfo :'data.frame': 169 obs. of 3 variables:
.. ..$ labels: chr [1:169] "frankfurter" "sausage" "liver loaf" "ham" ...
.. ..$ level2: Factor w/ 55 levels "baby food","bags",..: 44 44 44 44 44 44 44 42 42 41 ...
.. ..$ level1: Factor w/ 10 levels "canned food",..: 6 6 6 6 6 6 6 6 6 6 ...
..@ itemsetInfo:'data.frame': 0 obs. of 0 variables
However, read.transactions
let you import transaction ID and item ID only with parameter cols. I have also tried this
transaction_by_item<-split(df[,c("Item_ID","Category_ID")],df$Transaction_ID)
basket <- as(transaction_by_item, "transactions")
and it gave an error
Error in asMethod(object) : can coerce list with atomic components only
It works if I just try to split transactions with item IDs only. transaction_by_item<-split(df$Item_ID,df$Transaction_ID)
Anyone knows how to incorporate both item IDs (labels) and category IDs (level) when creating a transaction data? Thanks.
Upvotes: 1
Views: 478
Reputation: 9485
Maybe this can help, first of all let's introduce the arules
function itemInfo()
:
library(arules)
itemInfo(Groceries)
head(itemInfo(Groceries))
labels level2 level1
1 frankfurter sausage meat and sausage
2 sausage sausage meat and sausage
3 liver loaf sausage meat and sausage
4 ham sausage meat and sausage
5 meat sausage meat and sausage
6 finished products sausage meat and sausage
Now, as you stated, Groceries
has a couple of level, in other hands yours:
trans4 <- as(split(dats[,"Item_ID"], dats[,"Transaction_ID"]), "transactions")
str(trans4)
itemInfo(trans4)
labels
1 A001
2 A002
3 A003
4 A004
5 A005
Now, you've to add it to your data, so you can do this:
library(dplyr)
labels_ <- dats %>% select(Item_ID, Category_ID) %>% distinct()
itemInfo(trans4) <- data.frame(labels = labels_$Item_ID, level1 =labels_$Category_ID)
Now:
itemInfo(trans4)
labels level1
1 A001 A01
2 A002 A01
3 A003 A02
4 A005 A03
5 A004 A03
And:
str(trans4)
Formal class 'transactions' [package "arules"] with 3 slots
..@ data :Formal class 'ngCMatrix' [package "Matrix"] with 5 slots
.. .. ..@ i : int [1:11] 0 1 0 1 2 4 0 2 1 3 ...
.. .. ..@ p : int [1:6] 0 2 5 6 8 11
.. .. ..@ Dim : int [1:2] 5 5
.. .. ..@ Dimnames:List of 2
.. .. .. ..$ : NULL
.. .. .. ..$ : NULL
.. .. ..@ factors : list()
..@ itemInfo :'data.frame': 5 obs. of 2 variables:
.. ..$ labels: Factor w/ 5 levels "A001","A002",..: 1 2 3 5 4
.. ..$ level1: Factor w/ 3 levels "A01","A02","A03": 1 1 2 3 3 # here we go!!!
..@ itemsetInfo:'data.frame': 5 obs. of 1 variable:
.. ..$ transactionID: chr [1:5] "T01" "T02" "T03" "T04" ...
With data:
dats <- structure(list(Transaction_ID = structure(c(1L, 1L, 2L, 2L, 2L,
3L, 5L, 5L, 5L, 4L, 4L), .Label = c("T01", "T02", "T03", "T04",
"T05"), class = "factor"), Item_ID = structure(c(1L, 2L, 1L,
3L, 2L, 5L, 4L, 2L, 5L, 1L, 3L), .Label = c("A001", "A002", "A003",
"A004", "A005"), class = "factor"), Category_ID = structure(c(1L,
1L, 1L, 2L, 1L, 3L, 3L, 1L, 3L, 1L, 2L), .Label = c("A01", "A02",
"A03"), class = "factor")), class = "data.frame", row.names = c(NA,
-11L))
Upvotes: 2