Reputation: 155
I'm using spark-core_2.10 jar on my Java eclipse. I can't find any reduceBykey method in that! All i get as suggestion for reduce
are reduce
and treeReduce
. Any idea what's wrong here?
Upvotes: 0
Views: 1185
Reputation: 3316
reduceByKey
works only on RDD where there are key-value like data, they are called pairRDD.
Adding to the answers above, it doesn't matter if you work on Scala of Java, as long as your dataset is correct.
The ReduceByKey
will work on Tuple data in the following manner.
val l1 = List((1,2), (1,3), (4,2))
val l1RDD = sc.parallelize(l1)
l1RDD.reduceByKey(_+_)
ouput is: (1,5) (4,2)
Upvotes: 2
Reputation: 1483
Post you code and you RDD details reduceByKey
is part of PairRDD .if you have created the PairRDD then you can see the reduceByKey
.
Upvotes: 0
Reputation: 740
In Java there is more hassle with PairRDD
(compared to Scala, where the types are automatically inferred, or Python that doesn't consider types and expects tuples in runtime). As reduceByKey
needs to know the key, it is defined on JavaPairRDD
class.
You can get JavaPairRDD
from normal RDD
by calling JavaRDD#mapToPair
. You provide PairFunction
that returns tuple where the first element is taken as a key in the resulting JavaPairRDD
.
Upvotes: 2