Reputation: 1265
I want to use a ((String,String),BigDecimal) RDD as a PairRDD, so that I can use reduceByKey function. Spark does not recognise the RDD as PairRDD. Is there a way to achieve the reduce function with the RDD.
scala> jrdd2
jrdd2: org.apache.spark.rdd.RDD[((String, String), java.math.BigDecimal)] = MapPartitionsRDD[33] at map at <console>:30
scala> val jrdd3 = jrdd2.reduceBykey((a,b)=>(a.add(b),1))
<console>:28: error: value reduceBykey is not a member of org.apache.spark.rdd.RDD[((String, String), java.math.BigDecimal)]
val jrdd3 = jrdd2.reduceBykey((a,b)=>(a.add(b),1))
Upvotes: 0
Views: 410
Reputation: 13154
Your reduceByKey
must return a BigDecimal - not a tuple. Try this instead:
val rdd = sc.parallelize(Seq((("a", "b"), new java.math.BigDecimal(2)),
(("c", "d"), new java.math.BigDecimal(1)),
(("a", "b"), new java.math.BigDecimal(2))))
rdd.reduceByKey(_.add(_))
returns
((c,d),1)
((a,b),4)
Upvotes: 3