Michal Jeznach
Michal Jeznach

Reputation: 11

Adding hierarchy to transaction data in R

I am learning pattern mining and want to do multi level association mining. My dataset contains 25035 unique transactions and each transaction can consist of 1 to 12 items. In total, there are 3788 products, 18 subcategories and 3 categories. My problem is to add the aggregation levels/hierarchies to the transaction data. Whichever method I try, I always get the error message mismatching number of labels and columns.

I want my data to look something like this in the end:

library(arules)

> head(itemInfo(Groceries))
             labels  level2           level1
1       frankfurter sausage meat and sausage
2           sausage sausage meat and sausage
3        liver loaf sausage meat and sausage
4               ham sausage meat and sausage
5              meat sausage meat and sausage
6 finished products sausage meat and sausage

Raw data set looks like this:

         Order.ID Sub.Category                            Product.Name Category
1    AG-2011-2040      Storage                     Tenex Lockers- Blue        2
2   IN-2011-47883     Supplies                Acme Trimmer- High Speed        2
3    HU-2011-1220      Storage                 Tenex Box- Single Width        2
4 IT-2011-3647632        Paper             Enermax Note Cards- Premium        2
5   IN-2011-47883  Furnishings              Eldon Light Bulb- Duo Pack        1
6   IN-2011-47883        Paper Eaton Computer Printout Paper- 8.5 x 11        2

I have tried following this tutorial but without any success.

The data has been loaded following the mentioned instructions and looks like this:

Code to get data:

dats2 = structure(list(TID = structure(c(stores2$Order.ID),
                                   .Label = c(unique(stores2$Order.ID)),
                                   class = "character"),
                   ItemID = structure(c(stores2$Product.Name),
                                      .Label = c(unique(stores2$Product.Name)),
                                      class = "character"),
                   CatID = structure(c(stores2$Sub.Category),
                                     .Label = c(unique(stores2$Sub.Category)),
                                     class = "character")),
              class = "data.frame", row.names = c(NA, -51288L))

and the structure is as follows:

    > head(dats2)
              TID                                  ItemID       CatID
1    AG-2011-2040                     Tenex Lockers, Blue     Storage
2   IN-2011-47883                Acme Trimmer, High Speed    Supplies
3    HU-2011-1220                 Tenex Box, Single Width     Storage
4 IT-2011-3647632             Enermax Note Cards, Premium       Paper
5   IN-2011-47883              Eldon Light Bulb, Duo Pack Furnishings
6   IN-2011-47883 Eaton Computer Printout Paper, 8.5 x 11       Paper

After transforming the data into transactions:

trans5 = as(split(dats2[,"ItemID"], dats2[,"TID"]), "transactions")

I cannot find a way to add the labels and levels through itemInfo. After running the code, I get and error message:

    > itemInfo(trans5) = data.frame(labels = label2$ItemID, level1 = label2$CatID, stringsAsFactors = F)
Error in validObject(object) : 
  invalid class “transactions” object: item labels do not match number of columns

I understand the concept and logic behind it, but cannot put it to work in R with dataset and not just small "dummy" data.

Thanks in advance for any tips in solving the issue.

Upvotes: 1

Views: 82

Answers (0)

Related Questions