Reputation: 60014
I get this error:
value join is not a member of
org.apache.spark.rdd.RDD[(Long, (Int, (Long, String, Array[_0])))
forSome { type _0 <: (String, Double) }]
The only suggestion I found is import org.apache.spark.SparkContext._
I am already doing that.
What am I doing wrong?
EDIT: changing the code to eliminate forSome
(i.e., when the object has type org.apache.spark.rdd.RDD[(Long, (Int, (Long, String, Array[(String, Double)])))
) solved the problem.
Is this a bug in Spark?
Upvotes: 4
Views: 17598
Reputation: 51
Consider 2 Spark RDDs to be joined together..
Say, rdd1.first
is in the form of (Int, Int, Float) = (1,957,299.98)
while rdd2.first
is something like (Int, Int) = (25876,1)
where the join is supposed to take place on the 1st field from both the RDDs.
scala> rdd1.join(rdd2) --- results in an error :**: error: value join is not a member of org.apache.spark.rdd.RDD[(Int, Int, Float)]
REASON
Both the RDDs should be in the form of a Key-Value pair.
Here, rdd2 -- being in the form of (1,957,299.98) -- does not obey this rule.. While rdd1 -- which is in the form of (25876,1) -- does.
RESOLUTION
Convert the output of the 1st RDD from (1,957,299.98)
to a Key-Value pair in the form of (1,(957,299.98))
before joining it with rdd2, as shown below:
scala> val rdd1KV = rdd1.map(x=>(x.split(",")(1).toInt,(x.split(",")(2).toInt,x.split(",")(4).toFloat))) -- modified RDD
scala> rdd1KV.join(rdd2) -- join successful :)
res**: (Int, (Int, Float)) = (1,(957,299.98))
By the way, join is the member of org.apache.spark.rdd.PairRDDFunctions. So make sure you import this on your Eclipse or IDE, wherever you want to run your code.
Article also on my blog:
https://tips-to-code.blogspot.com/2018/08/apache-spark-error-resolution-value.html
Upvotes: 1
Reputation: 27455
join
is a member of org.apache.spark.rdd.PairRDDFunctions
. So why does the implicit class not trigger?
scala> val s = Seq[(Long, (Int, (Long, String, Array[_0]))) forSome { type _0 <: (String, Double) }]()
scala> val r = sc.parallelize(s)
scala> r.join(r) // Gives your error message.
scala> val p = new org.apache.spark.rdd.PairRDDFunctions(r)
<console>:25: error: no type parameters for constructor PairRDDFunctions: (self: org.apache.spark.rdd.RDD[(K, V)])(implicit kt: scala.reflect.ClassTag[K], implicit vt: scala.reflect.ClassTag[V], implicit ord: Ordering[K])org.apache.spark.rdd.PairRDDFunctions[K,V] exist so that it can be applied to arguments (org.apache.spark.rdd.RDD[(Long, (Int, (Long, String, Array[_0]))) forSome { type _0 <: (String, Double) }])
--- because ---
argument expression's type is not compatible with formal parameter type;
found : org.apache.spark.rdd.RDD[(Long, (Int, (Long, String, Array[_0]))) forSome { type _0 <: (String, Double) }]
required: org.apache.spark.rdd.RDD[(?K, ?V)]
Note: (Long, (Int, (Long, String, Array[_0]))) forSome { type _0 <: (String, Double) } >: (?K, ?V), but class RDD is invariant in type T.
You may wish to define T as -T instead. (SLS 4.5)
val p = new org.apache.spark.rdd.PairRDDFunctions(r)
^
<console>:25: error: type mismatch;
found : org.apache.spark.rdd.RDD[(Long, (Int, (Long, String, Array[_0]))) forSome { type _0 <: (String, Double) }]
required: org.apache.spark.rdd.RDD[(K, V)]
val p = new org.apache.spark.rdd.PairRDDFunctions(r)
I'm sure that error message is clear to everyone else, but just for my own slow self let's try to make sense of it. PairRDDFunctions
has two type parameters, K
and V
. Your forSome
is for the whole pair, so it cannot be split into separate K
and V
types. There are no K
and V
that RDD[(K, V)]
would equal your RDD type.
However, you could have the forSome
only apply to the key, instead of the whole pair. Join works now, because this type can be separated into K
and V
.
scala> val s2 = Seq[(Long, (Int, (Long, String, Array[_0])) forSome { type _0 <: (String, Double) })]()
scala> val r2 = sc.parallelize(2s)
scala> r2.join(r2)
res0: org.apache.spark.rdd.RDD[(Long, ((Int, (Long, String, Array[_0])) forSome { type _0 <: (String, Double) }, (Int, (Long, String, Array[_0])) forSome { type _0 <: (String, Double) }))] = MapPartitionsRDD[5] at join at <console>:26
Upvotes: 8