Reputation: 2480
When I call RDD.mapValues(...).reduceByKey(...)
my code does not compile. But when I reverse the order, RDD.reduceByKey(...).mapValues(...)
, the code does compile. The types appear to match up.
A complete minimal reproducing example is:
def test[E]() =
new SparkContext().textFile("")
.keyBy(_ ⇒ 0L)
.mapValues(_.asInstanceOf[E])
.reduceByKey((x, _) ⇒ x)
The compilation error is the same as in this question but its remedy doesn't help:
Test.scala:7: error: value reduceByKey is not a member of org.apache.spark.rdd.RDD[(Long, E)]
possible cause: maybe a semicolon is missing before `value reduceByKey'?
.reduceByKey((x, _) ⇒ x)
This issue seems to more at the Scala level than Spark. Replacing the type parameter with Int works so it might be a problem with type inference. I am using Spark 2.2.0 with Scala 2.11.
Upvotes: 0
Views: 361
Reputation: 18424
Methods such as .reduceByKey
and .mapValues
are members of PairRDDFunctions
but you can call them because there is an implicit conversion from RDD[(K, V)]
. But if you look closely at the definition of that conversion, you might spot the problem:
implicit def rddToPairRDDFunctions[K, V](rdd: RDD[(K, V)])
(implicit kt: ClassTag[K], vt: ClassTag[V], ord: Ordering[K] = null): PairRDDFunctions[K, V]
It requires a ClassTag
instance for the K
and V
types. In your example, none is available for E
, so the implicit conversion can't be applied, so it doesn't find the reduceByKey
method. Try this:
def test[E]()(implicit et: ClassTag[E]) = ...
or the shorthand:
def test[E : ClassTag]() = ...
Upvotes: 2