Reputation: 61

Pattern mining for item sets of length 2

I am looking for association mining algorithm where I can mine frequent item sets of length 2 only. Is it better to use a query on database to compute frequent items when stopping at 2-itemsets.

Upvotes: 1

Answers (2)

Phil

Reputation: 3520

If your input is a text file and you just want to find itemsets of length 2, you can just scan the file once and count the support of 2-itemsets direclty. It will be very efficient.

For this case, you don't need to use Apriori FPGrowth, or any other fancy algorithm. You can just use a FOR loop over your file and a map to store the frequency of each pair of items that you encounter when scanning the file.

Then when the scan end, you will get all the support of two itemsets and you can output only those with a support >= minsup.

Another way is to use a triangular matrix instead of a map to count the support of each pair of item. It would be a little bit faster than using a map but it may waste more memory if your data is sparse.

Upvotes: 1

Has QUIT--Anony-Mousse

Reputation: 77454

Itemsets of length 2 don't benefit from pruning rules such as monotonicity.

You probably can compute the 2 itemsets using clever JOINs with little cost in performance (and in fact, your DBMS will likely accelerate this better than your own code).

See MadLIB for a library to run Frequent Itemset Mining via SQL on PostgreSQL databases.

Upvotes: 0

Pattern mining for item sets of length 2

Answers (2)

Related Questions