CscienceTM
CscienceTM

Reputation: 1

Incorporating Item quantity in the transactions for Apriori algorithm

I was working on a simple recommender system, i started off with apriori algorithm using arules in R. To my surprise i got 0 rules for when support was greater that 0.0001, which is too low a value for support. I figured out that the reason for this could be that the duplicate items in each transaction are being removed. I tried to solve this by setting remove duplicates as false:

df = read.transactions("transactions.csv",sep = ',',rm.duplicates = FALSE)

But that didn't work and i got the following

Warning message:
 In asMethod(object) : removing duplicated items in transactions  

So is there a way to solve this, or is there a better way to consider the quantity of each item in every transaction in the code? Is there a better option in python or any other language? It would be great if anyone could help me out on this.

Upvotes: 0

Views: 749

Answers (2)

mbros
mbros

Reputation: 1

I've just encountered this question today; maybe someone else will stumble upon it as well. However, in my case, the support metric is not the main parameter.

I'm working with transactional data from a supermarket, and you can expect a LOT of different items, around 20k. Even when grouping, you will encounter very low levels of support. What I'm doing is removing transactions with only 1 item (I can't find rules if there is no consequent) and filtering my rules by lift.

Maybe you could improve the clustering of itens, instead of coke, pepsi... work with soft drink and remove one item transactions

Upvotes: 0

Has QUIT--Anony-Mousse
Has QUIT--Anony-Mousse

Reputation: 77454

The support is based on the number of transactions.

The quantity of an item thus does not matter for the support by definition.

Your problem is probably that you did not preprocess your data well enough. For association rules, it usually seems to be necessary to work with product groups or classes rather than individual product codes. I.e. find rules with "beer" and "milk" rather than "Wilmaukee's worst 12 oz. can 24 pack" and "FUGGIES UnderNites Diapers, Size 4, 56 ct, BIG PACK". Merging such overdifferentiated products dorsimprove the support.

Upvotes: 0

Related Questions