Reputation: 61
I am looking for association mining algorithm where I can mine frequent item sets of length 2 only. Is it better to use a query on database to compute frequent items when stopping at 2-itemsets.
Upvotes: 1
Views: 540
Reputation: 3520
If your input is a text file and you just want to find itemsets of length 2, you can just scan the file once and count the support of 2-itemsets direclty. It will be very efficient.
For this case, you don't need to use Apriori FPGrowth, or any other fancy algorithm. You can just use a FOR loop over your file and a map to store the frequency of each pair of items that you encounter when scanning the file.
Then when the scan end, you will get all the support of two itemsets and you can output only those with a support >= minsup.
Another way is to use a triangular matrix instead of a map to count the support of each pair of item. It would be a little bit faster than using a map but it may waste more memory if your data is sparse.
Upvotes: 1
Reputation: 77454
Itemsets of length 2 don't benefit from pruning rules such as monotonicity.
You probably can compute the 2 itemsets using clever JOIN
s with little cost in performance (and in fact, your DBMS will likely accelerate this better than your own code).
See MadLIB for a library to run Frequent Itemset Mining via SQL on PostgreSQL databases.
Upvotes: 0