Reputation: 19
I have a text file with lines that contain userid
and rid
separated by |
(pipe). rid
values correspond to many labels on another file.
How can I use flatMap
to implement a method as follows:
xRdd = sc.textFile("file.txt").flatMap { line =>
val (userid,rid) = line.split("\\|")
val labelsArr = getLabels(rid)
labelsArr.foreach{ i =>
((userid, i), 1)
}
}
At compile time, I get an error:
type mismatch; found : Unit required: TraversableOnce[?]
Upvotes: 2
Views: 1871
Reputation: 74759
This is exactly the reason why I said here and here that Scala's for-comprehension could make things easier. And should help you out too.
When you see a series of flatMap
and map
that's the moment where the nesting should trigger some thinking about solutions to cut the "noise". That begs for simpler solutions, doesn't it?
See the following and appreciate Scala (and its for-comprehension) yourself!
val lines = sc.textFile("file.txt")
val pairs = for {
line <- lines
Array(userid, rid) = line.split("\\|")
label <- getLabels(rid)
} yield ((userid, label), 1)
If you throw in Spark SQL to the mix, things would get even simpler. Just to whet your appetite:
scala> pairs.toDF.show
+-----------------+---+
| _1| _2|
+-----------------+---+
| [jacek,1]| 1|
|[jacek,getLabels]| 1|
| [agata,2]| 1|
|[agata,getLabels]| 1|
+-----------------+---+
I'm sure you can guess what was inside my file.txt
file, can't you?
Upvotes: 1
Reputation: 11587
piecing together the information provided it seems you will have to replace your foreach
operation with a map
operation.
xRdd = sc.textFile("file.txt") flatMap { line =>
val (userid,rid) = line.split("\\|")
val labelsArr = getLabels(rid)
labelsArr.map(i=>((userid,i),1))
}
Upvotes: 2