insomniac
insomniac

Reputation: 155

can't find reduceByKey method on spark

I'm using spark-core_2.10 jar on my Java eclipse. I can't find any reduceBykey method in that! All i get as suggestion for reduce are reduce and treeReduce. Any idea what's wrong here?

Upvotes: 0

Views: 1185

Answers (3)

antonpuz
antonpuz

Reputation: 3316

reduceByKey works only on RDD where there are key-value like data, they are called pairRDD.

Adding to the answers above, it doesn't matter if you work on Scala of Java, as long as your dataset is correct.

The ReduceByKey will work on Tuple data in the following manner.

val l1 = List((1,2), (1,3), (4,2))
val l1RDD = sc.parallelize(l1)
l1RDD.reduceByKey(_+_)

ouput is: (1,5) (4,2)

Upvotes: 2

Indrajit Swain
Indrajit Swain

Reputation: 1483

Post you code and you RDD details reduceByKey is part of PairRDD .if you have created the PairRDD then you can see the reduceByKey.

Upvotes: 0

Martin Milichovsky
Martin Milichovsky

Reputation: 740

In Java there is more hassle with PairRDD (compared to Scala, where the types are automatically inferred, or Python that doesn't consider types and expects tuples in runtime). As reduceByKey needs to know the key, it is defined on JavaPairRDD class.

You can get JavaPairRDD from normal RDD by calling JavaRDD#mapToPair. You provide PairFunction that returns tuple where the first element is taken as a key in the resulting JavaPairRDD.

Upvotes: 2

Related Questions