jrook
jrook

Reputation: 3519

Cannot make Spark run inside a scala worksheet in Intellij Idea

The following code runs with no problems if I put it inside an object which extends the App trait and run it using Idea's run command.

However, when I try running it from a worksheet, I encounter one of these scenarios:

1- If the first line is present, I get:

Task not serializable: java.io.NotSerializableException:A$A34$A$A34

2- If the first line is commented out, I get:

Unable to generate an encoder for inner class A$A35$A$A35$A12 without access to the scope that this class was defined in.

//First line!
org.apache.spark.sql.catalyst.encoders.OuterScopes.addOuterScope(this)

import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.types.{IntegerType, StructField, StructType}

case class AClass(id: Int, f1: Int, f2: Int)
val spark = SparkSession.builder()
  .master("local[*]")
  .appName("Test App")
  .getOrCreate()
import spark.implicits._

val schema = StructType(Array(
  StructField("id", IntegerType),
  StructField("f1", IntegerType),
  StructField("f2", IntegerType)))

val df = spark.read.schema(schema)
  .option("header", "true")
  .csv("dataset.csv")

// Displays the content of the DataFrame to stdout
df.show()
val ads = df.as[AClass]

//This is the line that causes serialization error
ads.foreach(x => println(x))

The project has been created using Idea's Scala plugin, and this is my build.sbt:

   ...
   scalaVersion := "2.10.6"
   scalacOptions += "-unchecked"
   libraryDependencies ++= Seq(
       "org.apache.spark" % "spark-core_2.10" % "2.1.0",
       "org.apache.spark" % "spark-sql_2.10" % "2.1.0",
       "org.apache.spark" % "spark-mllib_2.10" % "2.1.0"
       )

I tried the solution in this answer. But it is not working for Idea Ultimate 2017.1 which I am using and also, when I use worksheets, I prefer not to add an extra object to the worksheet if at all possible.

if I use collect() method on the dataset object and get an Array of "Aclass" instances, there will be no more errors either. It is trying to work with the DS directly that causes the error.

Upvotes: 4

Views: 876

Answers (1)

Arnon Rotem-Gal-Oz
Arnon Rotem-Gal-Oz

Reputation: 25909

Use eclipse compatibility mode (open Preferences-> type scala -> in Languages & Frameworks, choose Scala -> Choose Worksheet -> only select eclipse compatibility mode) see https://gist.github.com/RAbraham/585939e5390d46a7d6f8

Upvotes: 1

Related Questions