Reputation: 23
Given results generated from prefixSpan model, How to save all results into files with sorted order by frequency? The result's data structure can't directly support saveAsTextFile function.
result = model.freqSequences().collect()
print result
for fs in result:
print('{}, {}'.format(fs.sequence,fs.freq))
The expected results are something like:
[[20],[3]], 8,11.42%
[[7]], 6,8.57%
[[13]], 2,2.85%
Upvotes: 1
Views: 253
Reputation: 37832
If you just want to save the freqSequences
, simply don't call collect
on that RDD and you can save it into a text file:
model.freqSequences().saveAsTextFile("path")
However, that might not be that useful, as it will save the result of toString
on each of the RDD's records (of type PrefixSpan.FreqSequence[Item]
) which might not be too easily parsed later on. So - you can format these records into Strings yourself to write them in a format you'll be able to use:
// Format an array using brackets, if that's how you want it;
// You can implement whatever format you want...
def format[T](t: T): String = t match {
case a: Array[_] => a.map(t => s"[${format(t)}]").mkString(",")
case _ => t.toString
}
model.freqSequences()
.map(fs => s"${format(fs.sequence)}, ${fs.freq}")
.saveAsTextFile("path")
Lastly, if you're saving these results in order to apply this model using Spark later, consider using save
directly:
model.save(sc, "path")
Which can later be loaded using:
PrefixSpanModel.load(sc, "path")
Upvotes: 0