Diego Serrano
Diego Serrano

Reputation: 1016

Frequent itemset best algorithm and library

I started working on machine learning projects a few days ago and I have the following situation:

I have a database of itineraries (an itinerary is a set of destinations that where selected together as part of a trip) and I want to identify if a destination is going to be selected as part of a trip given other selected destinations. Here is an example given that A, B, C, D are destinations:

A, B -> C
A, D, C -> B

I think this is a recommender system problem and I studied techniques to approach a solution.

I tried using WEKA's Apriori and FPGrowth but I have not been able to generate a result as I have 91 items and 12,000 transactions (so, that is an ARFF file that contains 91 columns and 12,000 rows with TRUE and FALSE values) and the program never ends nor consumes more than 5 GB of RAM (I waited 30hours for the algorithm to run on a Core i7 last gen and 12GB RAM PC). Also, I don't see any option to select only the rules that have a value of TRUE as an implication (I need this as I want to see if someone will travel to X given the fact that some other people travel to Y.

So, are there any other techniques or approaches that can be used to achieve the result I am expecting? I want to have as an output a file with the "rules" or the set of items that "imply" the other set of items, and the probability of that "recommendation".

Example:

A, B -> C ; 90% 
verbose: "People who travel to Rome and Florence travel to Milan with a probability (or other measure) of 90%"

Thanks!

Upvotes: 0

Views: 290

Answers (2)

Phil
Phil

Reputation: 3520

Actually, the implementation in Weka is quite inefficient. You could check the SPMF data mining library in Java, which offers efficient implementations of algorithms for pattern mining. It has actually more than 100 algorithms, including Apriori, FPGrowth and many others. I would recommend to use FPGrowth which is very fast and memory efficient. But you could also check the other algorithms. By the way, I am the founder of the librar.

Upvotes: 0

n01dea
n01dea

Reputation: 1580

Something with your implementation of the Apriori algorithm doesn't seem right. Try to use another implementation of the Apriori algorithm or check the current implementation. For the stated purpose to generate association rules between the destinations are the Apriori or the faster FP-Growth algorithm just fine. Maybe this helps for a general understanding: R - association rules - apriori

Upvotes: 1

Related Questions