Reputation: 69
I have a Spark RDD as below.
abc,def,ghi
sdfdf,sdfsdf,dfdf444sdsdd
I want to transform every record (line) by splitting it with comma(,) to create all distinct combination of 2 spitted values.
Output RDD (eg) is below
abc def
abc ghi
def ghi
Upvotes: 0
Views: 221
Reputation: 1532
flatMap must be used for the combinations part. Use something like the following:
rdd.map(_.split(",")).flatMap(tokens => getCombinations(tokens))
...where getCombinations has the signature:
def getCombinations(tokens: List[String]): List[(String, String)]
Upvotes: 1