suvojyotic
suvojyotic

Reputation: 25

Spark MLlib FPGrowth running but not displaying frequent item sets

I am trying to do a basic market basket analysis with FPGrowth from MLlib, on transaction data. I have coded for the transactions to be in like format :

    transactions.take(3)
    res632: Array[Array[String]] = Array(Array(7976503128), Array(68113132893, 1800000725, 3120027015, 4850030414, 2100061223, 5150055538, 60538871457), Array(68113174202))

Where the individual numbers in the arrays are my product id's taken as strings (like, 68113132893, 7976503128, etc).

Now when I am running the FPGrowth model, it is running without any errors:

    val fpg = new FPGrowth()
        .setMinSupport(0.5)
        .setNumPartitions(10)
    val modelBuild = fpg.run(transactions)

    fpg: org.apache.spark.mllib.fpm.FPGrowth = org.apache.spark.mllib.fpm.FPGrowth@74a103be
    modelBuild: org.apache.spark.mllib.fpm.FPGrowthModel[String] = org.apache.spark.mllib.fpm.FPGrowthModel@391b111a

But when I am trying to get the frequent itemsets, it is showing blank array

    modelBuild.freqItemsets.collect().foreach { itemset =>
    println(itemset.freq)
    }

    res660: Array[org.apache.spark.mllib.fpm.FPGrowth.FreqItemset[String]] = Array()

Not able to find what is going wrong. Please help!

Upvotes: 1

Views: 553

Answers (1)

user3514004
user3514004

Reputation: 46

Decrease the minSupport to 0.00001 and all sets will be printed. From Spark documentation:

minSupport: the minimum support for an itemset to be identified as frequent. For example, if an item appears 3 out of 5 transactions, it has a support of 3/5=0.6.

Upvotes: 3

Related Questions