Reputation: 25
I am trying to do a basic market basket analysis with FPGrowth from MLlib, on transaction data. I have coded for the transactions to be in like format :
transactions.take(3)
res632: Array[Array[String]] = Array(Array(7976503128), Array(68113132893, 1800000725, 3120027015, 4850030414, 2100061223, 5150055538, 60538871457), Array(68113174202))
Where the individual numbers in the arrays are my product id's taken as strings (like, 68113132893, 7976503128, etc).
Now when I am running the FPGrowth model, it is running without any errors:
val fpg = new FPGrowth()
.setMinSupport(0.5)
.setNumPartitions(10)
val modelBuild = fpg.run(transactions)
fpg: org.apache.spark.mllib.fpm.FPGrowth = org.apache.spark.mllib.fpm.FPGrowth@74a103be
modelBuild: org.apache.spark.mllib.fpm.FPGrowthModel[String] = org.apache.spark.mllib.fpm.FPGrowthModel@391b111a
But when I am trying to get the frequent itemsets, it is showing blank array
modelBuild.freqItemsets.collect().foreach { itemset =>
println(itemset.freq)
}
res660: Array[org.apache.spark.mllib.fpm.FPGrowth.FreqItemset[String]] = Array()
Not able to find what is going wrong. Please help!
Upvotes: 1
Views: 553
Reputation: 46
Decrease the minSupport to 0.00001 and all sets will be printed. From Spark documentation:
minSupport: the minimum support for an itemset to be identified as frequent. For example, if an item appears 3 out of 5 transactions, it has a support of 3/5=0.6.
Upvotes: 3