icecream
icecream

Reputation: 23

How to save prefixSpan results into files using Scala?

Given results generated from prefixSpan model, How to save all results into files with sorted order by frequency? The result's data structure can't directly support saveAsTextFile function.

result = model.freqSequences().collect()
print result
for fs in result:
    print('{}, {}'.format(fs.sequence,fs.freq))

The expected results are something like:

[[20],[3]], 8,11.42%
[[7]], 6,8.57%
[[13]], 2,2.85%

Upvotes: 1

Views: 253

Answers (1)

Tzach Zohar
Tzach Zohar

Reputation: 37832

If you just want to save the freqSequences, simply don't call collect on that RDD and you can save it into a text file:

model.freqSequences().saveAsTextFile("path")

However, that might not be that useful, as it will save the result of toString on each of the RDD's records (of type PrefixSpan.FreqSequence[Item]) which might not be too easily parsed later on. So - you can format these records into Strings yourself to write them in a format you'll be able to use:

// Format an array using brackets, if that's how you want it; 
// You can implement whatever format you want...
def format[T](t: T): String = t match {
  case a: Array[_] => a.map(t => s"[${format(t)}]").mkString(",")
  case _ => t.toString
}

model.freqSequences()
  .map(fs => s"${format(fs.sequence)}, ${fs.freq}")
  .saveAsTextFile("path")

Lastly, if you're saving these results in order to apply this model using Spark later, consider using save directly:

model.save(sc, "path")

Which can later be loaded using:

PrefixSpanModel.load(sc, "path")

Upvotes: 0

Related Questions