Reputation: 11
I am learning pattern mining and want to do multi level association mining. My dataset contains 25035 unique transactions and each transaction can consist of 1 to 12 items. In total, there are 3788 products, 18 subcategories and 3 categories. My problem is to add the aggregation levels/hierarchies to the transaction data. Whichever method I try, I always get the error message mismatching number of labels and columns.
I want my data to look something like this in the end:
library(arules)
> head(itemInfo(Groceries))
labels level2 level1
1 frankfurter sausage meat and sausage
2 sausage sausage meat and sausage
3 liver loaf sausage meat and sausage
4 ham sausage meat and sausage
5 meat sausage meat and sausage
6 finished products sausage meat and sausage
Raw data set looks like this:
Order.ID Sub.Category Product.Name Category
1 AG-2011-2040 Storage Tenex Lockers- Blue 2
2 IN-2011-47883 Supplies Acme Trimmer- High Speed 2
3 HU-2011-1220 Storage Tenex Box- Single Width 2
4 IT-2011-3647632 Paper Enermax Note Cards- Premium 2
5 IN-2011-47883 Furnishings Eldon Light Bulb- Duo Pack 1
6 IN-2011-47883 Paper Eaton Computer Printout Paper- 8.5 x 11 2
I have tried following this tutorial but without any success.
The data has been loaded following the mentioned instructions and looks like this:
Code to get data:
dats2 = structure(list(TID = structure(c(stores2$Order.ID),
.Label = c(unique(stores2$Order.ID)),
class = "character"),
ItemID = structure(c(stores2$Product.Name),
.Label = c(unique(stores2$Product.Name)),
class = "character"),
CatID = structure(c(stores2$Sub.Category),
.Label = c(unique(stores2$Sub.Category)),
class = "character")),
class = "data.frame", row.names = c(NA, -51288L))
and the structure is as follows:
> head(dats2)
TID ItemID CatID
1 AG-2011-2040 Tenex Lockers, Blue Storage
2 IN-2011-47883 Acme Trimmer, High Speed Supplies
3 HU-2011-1220 Tenex Box, Single Width Storage
4 IT-2011-3647632 Enermax Note Cards, Premium Paper
5 IN-2011-47883 Eldon Light Bulb, Duo Pack Furnishings
6 IN-2011-47883 Eaton Computer Printout Paper, 8.5 x 11 Paper
After transforming the data into transactions:
trans5 = as(split(dats2[,"ItemID"], dats2[,"TID"]), "transactions")
I cannot find a way to add the labels and levels through itemInfo. After running the code, I get and error message:
> itemInfo(trans5) = data.frame(labels = label2$ItemID, level1 = label2$CatID, stringsAsFactors = F)
Error in validObject(object) :
invalid class “transactions” object: item labels do not match number of columns
I understand the concept and logic behind it, but cannot put it to work in R with dataset and not just small "dummy" data.
Thanks in advance for any tips in solving the issue.
Upvotes: 1
Views: 82