Reputation: 386
Probably this's simple problem, but I begin my adventure with spark.
Problem: I'd like to get following structure (Expected result) in spark. Now I have following structure.
title1, {word11, word12, word13 ...}
title2, {word12, word22, word23 ...}
Data are stored in Dataset[(String, Seq[String])]
Excepted result I would like to get Tuple [word, title]
word11, {title1}
word12, {title1}
What I do
1. Make (title, seq[word1,word2,word,3])
docs.mapPartitions { iter =>
iter.map {
case (title, contents) => {
val textToLemmas: Seq[String] = toText(....)
(title, textToLemmas)
}
}
}
Thanks for answer.
Upvotes: 3
Views: 731
Reputation: 74739
I'm surprised no one offered a solution with Scala's for-comprehension (that gets "desugared" to flatMap
and map
as in Yuval Itzchakov's answer at compile time).
When you see a series of flatMap
and map
(possibly with filter
) that's Scala's for-comprehension.
So the following:
val result = dataSet.flatMap { case (title, words) => words.map((_, title)) }
is equivalent to the following:
val result = for {
(title, words) <- dataSet
w <- words
} yield (w, title)
After all, that's why we enjoy flexibility of Scala, isn't it?
Upvotes: 2
Reputation: 2434
Another solution is to call the explode
function like this :
import org.apache.spark.sql.functions.explode
dataset.withColumn("_2", explode("_2")).as[(String, String)]
Hope this help you, Best Regrads.
Upvotes: 2
Reputation: 149598
This should work:
val result = dataSet.flatMap { case (title, words) => words.map((_, title)) }
Upvotes: 3