PixieDev
PixieDev

Reputation: 307

Why does spark-shell fail to load a file with class with RDD imported?

I use Spark 2.1.1 with Scala 2.11.8.

Inside spark-shell I use :load command to load a class that has methods with RDDs.

When I try to load the class I get the following compilation error:

error: not found: type RDD

Why? I've got the import statement.

image

This is the code I'm working with

image1

Upvotes: 3

Views: 3132

Answers (2)

Bhavya Jain
Bhavya Jain

Reputation: 621

This is a bug in spark-shell, please refer to https://issues.apache.org/jira/browse/SPARK-22393 and was fixed in spark 2.3.0. Please use spark 2.3.0(or later) or use the method suggested by @Jacek Laskowski

Upvotes: 1

Jacek Laskowski
Jacek Laskowski

Reputation: 74619

That seems a feature of :load in spark-shell. A solution is to move import org.apache.spark.rdd.RDD (no dot and underscore) to your class definition.

This seems not specific to the RDD class but any classes imported. It won't work unless the import statement is defined inside the class itself.

With that said, the following won't work due to import being outside the class.

import org.apache.spark.rdd.RDD
class Hello {
  def get(rdd: RDD[String]): RDD[String] = rdd
}

scala> :load hello.scala
Loading hello.scala...
import org.apache.spark.rdd.RDD
<console>:12: error: not found: type RDD
         def get(rdd: RDD[String]): RDD[String] = rdd
                                    ^
<console>:12: error: not found: type RDD
         def get(rdd: RDD[String]): RDD[String] = rdd
                      ^

You can see what happens under the covers using -v flag of :load.

scala> :load -v hello.scala
Loading hello.scala...

scala>

scala> import org.apache.spark.rdd.RDD
import org.apache.spark.rdd.RDD

scala> class Hello {
     |   def get(rdd: RDD[String]): RDD[String] = rdd
     | }
<console>:12: error: not found: type RDD
         def get(rdd: RDD[String]): RDD[String] = rdd
                                    ^
<console>:12: error: not found: type RDD
         def get(rdd: RDD[String]): RDD[String] = rdd
                      ^

That led me to guess that having the import inside the class definition could help. And it did! (to my great surprise)

class Hello {
  import org.apache.spark.rdd.RDD
  def get(rdd: RDD[String]): RDD[String] = rdd
}

scala> :load -v hello.scala
Loading hello.scala...

scala> class Hello {
     |   import org.apache.spark.rdd.RDD
     |   def get(rdd: RDD[String]): RDD[String] = rdd
     | }
defined class Hello

You could also use :paste command to paste the class to spark-shell. There's the so-called raw mode when you could define classes in their own package.

package mypackage

class Hello {
  import org.apache.spark.rdd.RDD
  def get(rdd: RDD[String]): RDD[String] = rdd
}

scala> :load -v hello.scala
Loading hello.scala...

scala> package mypackage
<console>:1: error: illegal start of definition
package mypackage
^

scala>

scala> class Hello {
     |   import org.apache.spark.rdd.RDD
     |   def get(rdd: RDD[String]): RDD[String] = rdd
     | }
defined class Hello

scala> :paste -raw
// Entering paste mode (ctrl-D to finish)

package mypackage

class Hello {
  import org.apache.spark.rdd.RDD
  def get(rdd: RDD[String]): RDD[String] = rdd
}

// Exiting paste mode, now interpreting.

Upvotes: 7

Related Questions