Reputation: 23
I have a large data set (3.5 million observations and 185 variables) that I'm doing market basket analysis on using apriori()
, most of the columns have a yes/no result. I've converted my data frame correctly but for some of the yes/no columns one of the factors (usually a yes) will occasionally not run and give an Error in asMethod(object) : variable is an unknown item label
as the output or it won't write any rules while the others run fine. Since my file is so large I need to narrow down the rules I run via a lhs =
specification, hence my concern about the sporadic code.
I've checked that the label exists in my dataframe, it does, and I went so far to factor it again just in case that's the issue. When I run labels()
on my transaction data I can't find any entries with the problematic label despite table()
showing that some exist. However, I don't have an efficient way to search all the transaction data so I only searched a few hundred transactions so they could still be there.
my csv is a dataframe that has a row per transaction and column for basket items. Its not as wide as it could be because the Yes/no values are in the same column. I've also attached the column name to the cells with a . to make the rules easier to read. df2
is the same as ExportMD1.csv
Here's my data conversion
tr <- read.transactions('ExportMD1.csv', format = 'basket', sep = ',', cols = 185, header = TRUE)
I'll use isTreasuryBill
as an example, the table shows that there are 'yes' values
table(df2$isInterestBearing)
isInterestBearing.n 69745
isInterestBearing.y 276824
I get one of two outputs when I run the following code:
rules <- apriori(tr, paramete = list(supp = 0.5, conf = 0.8, minlen = 2), appearance = list(lhs= "isInterestBearing"))
Option 1
Error in asMethod(object) : isInterestBearing is an unknown item label
4. stop(paste(indicator[!indicator %in% from$labels], "is an unknown item label", collapse = ", "))
3. asMethod(object)
2. as(c(appearance, list(labels = itemLabels(data))), "APappearance")
1. apriori(tr, paramete = list(supp = 0.5, conf = 0.8, minlen = 2), appearance = list(lhs = "isInterestBearing"))
Option 2
Parameter specification:
Algorithmic control:
Absolute minimum support count: 173284
set item appearances ...[1 item(s)] done [0.04s].
set transactions ...[430165 item(s), 346569 transaction(s)] done [24.73s].
sorting and recoding items ... [177 item(s)] done [0.97s].
creating transaction tree ... done [1.35s].
checking subsets of size 1 done [0.02s].
writing ... [0 rule(s)] done [0.04s].
creating S4 object ... done [0.22s].
There's no difference in the dataframe or read.transaction
when these issues occur.
Ideally apriori()
would run consistently without any errors. I suspect that the reason I'm not getting any rules for some is because the counts are so low but I have no idea why the labels aren't being reliably recognized.
Upvotes: 0
Views: 166
Reputation: 3075
I think you are just not using the right item label in appearance. Check what item labels you have in your transactions with
itemlabels(tr)
The correct item label will be something like isInterestBearing=y
.
Upvotes: 0