Reputation: 25416
I am trying to map RDD to pairRDD in scala, so I could use reduceByKey later. Here is what I did:
userRecords is of org.apache.spark.rdd.RDD[UserElement]
I try to create a pairRDD from userRecords like below:
val userPairs: PairRDDFunctions[String, UserElement] = userRecords.map { t =>
val nameKey: String = t.getName()
(nameKey, t)
}
However, I got the error:
type mismatch; found : org.apache.spark.rdd.RDD[(String, com.mypackage.UserElement)] required: org.apache.spark.rdd.PairRDDFunctions[String,com.mypackage.UserElement]
What am I missing here? Thanks a lot!
Upvotes: 4
Views: 12108
Reputation: 3374
You can also use keyBy method, you need to provide the key in the function,
in your example, you can simply give userRecords.keyBy(t => t.getName())
Upvotes: 1
Reputation: 8996
I think you are just missing the import to org.apache.spark.SparkContext._
. This brings all the right implicit conversions in scope to create the PairRDD.
The example below should work (assuming you have initialized a SparkContext under sc):
import org.apache.spark.SparkContext._
val f = sc.parallelize(Array(1,2,3,4,5))
val g: PairRDDFunctions[String, Int] = f.map( x => (x.toString, x))
Upvotes: 2
Reputation: 67135
You don't need to do that as it is done via implicits (explicitly rddToPairRDDFunctions
). Any RDD that is of type Tuple2[K,V]
can automatically be used as a PairRDDFunctions
. If you REALLY want to, you can explicitly do what the implicit
does and wrap the RDD in a PairRDDFunction
:
val pair = new PairRDDFunctions(rdd)
Upvotes: 7