hevensun
hevensun

Reputation: 19

How to use RDD.flatMap?

I have a text file with lines that contain userid and rid separated by | (pipe). rid values correspond to many labels on another file.

How can I use flatMap to implement a method as follows:

xRdd  = sc.textFile("file.txt").flatMap { line => 
  val (userid,rid) = line.split("\\|")
  val labelsArr = getLabels(rid)
  labelsArr.foreach{ i =>
    ((userid, i), 1)
  }
}

At compile time, I get an error:

type mismatch; found : Unit required: TraversableOnce[?]

Upvotes: 2

Views: 1871

Answers (2)

Jacek Laskowski
Jacek Laskowski

Reputation: 74759

This is exactly the reason why I said here and here that Scala's for-comprehension could make things easier. And should help you out too.

When you see a series of flatMap and map that's the moment where the nesting should trigger some thinking about solutions to cut the "noise". That begs for simpler solutions, doesn't it?

See the following and appreciate Scala (and its for-comprehension) yourself!

val lines = sc.textFile("file.txt")
val pairs = for {
  line <- lines
  Array(userid, rid) = line.split("\\|")
  label <- getLabels(rid)
} yield ((userid, label), 1)

If you throw in Spark SQL to the mix, things would get even simpler. Just to whet your appetite:

scala> pairs.toDF.show
+-----------------+---+
|               _1| _2|
+-----------------+---+
|        [jacek,1]|  1|
|[jacek,getLabels]|  1|
|        [agata,2]|  1|
|[agata,getLabels]|  1|
+-----------------+---+

I'm sure you can guess what was inside my file.txt file, can't you?

Upvotes: 1

rogue-one
rogue-one

Reputation: 11587

piecing together the information provided it seems you will have to replace your foreach operation with a map operation.

xRdd  = sc.textFile("file.txt") flatMap { line => 
  val (userid,rid) = line.split("\\|")
  val labelsArr = getLabels(rid)
  labelsArr.map(i=>((userid,i),1))
}

Upvotes: 2

Related Questions