Irene
Irene

Reputation: 792

Spark 1.2 SQL code does not work with Spark 1.3 SQL code

I have used so far this build.sbt in the local package directory

name := "spark27_02"

version := "1.0"

scalaVersion := "2.10.4"

sbtVersion := "0.13.7"

libraryDependencies += "org.apache.spark" %% "spark-core" % "1.2.1"

libraryDependencies += "org.apache.spark" %% "spark-streaming" % "1.2.1"

libraryDependencies += "org.apache.spark" %% "spark-sql" % "1.2.1"

libraryDependencies += "org.apache.hadoop" % "hadoop-hdfs" % "2.5.0"

i wanted to try out the 1.3.0 version that just came out, so i used the 1.3.0 versions of all the packages. Spark compiles, but SparkSQL does not, so I checked MavenCentral that suggests to use

libraryDependencies += "org.apache.spark" % "spark-sql_2.10" % "1.3.0"

but still not working. I do sbt update from the sbt shell. Btw using Scala 2.10.4

What silly thing am I doing wrong?

Any help is appreciated.

EDIT referring to the example on the spark webpage with this build.sbt

name := "Marzia2"

version := "1.0"

scalaVersion := "2.10.4"

sbtVersion := "0.13.7"

libraryDependencies += "org.apache.spark" % "spark-core_2.10" % "1.3.0"

libraryDependencies += "org.apache.spark" % "spark-streaming_2.10" % "1.3.0"

libraryDependencies += "org.apache.spark" % "spark-sql_2.10" % "1.3.0"

doing

sbt package 

i get

[info] Compiling 1 Scala source to /home/cloudera/IdeaProjects/Marzia2/target/scala-2.10/classes...
[error] /home/cloudera/IdeaProjects/Marzia2/src/main/scala/prova_sql.scala:35: value createSchemaRDD is not a member of org.apache.spark.sql.SQLContext
[error]     import sqlContext.createSchemaRDD
[error]            ^
[error] /home/cloudera/IdeaProjects/Marzia2/src/main/scala/prova_sql.scala:38: value registerTempTable is not a member of org.apache.spark.rdd.RDD[prova_sql.Person]
[error]     people.registerTempTable("people")
[error]            ^
[error] two errors found
[error] (compile:compile) Compilation failed

and in case i use the new features like implicits in defining the spark context, i still get an error relating to it not being an error of the sparksql context.

There must be some stupid error somewhere.

Upvotes: 3

Views: 3705

Answers (1)

Justin Pihony
Justin Pihony

Reputation: 67075

One part of the problem is that SchemaRDD became a DataFrame. Realistically, you should use

import sqlContext._

instead of the specific import as it will future proof you against implicit changes, but if you really want then you can use

import sqlContext.implicits

BUT, the second part is that 1.3.0 broke compatibility and is now locked in from an API perspective, so you now need to do the following:

  • The implicits are not fully blown as 1.2. In order to use them, you now have to do: rdd.toDF().registerTempTable("xyz")

Note the toDF

Now that the API is locked in, I cannot think of a way to add the more intuitive implicit back in. You would end up with conflicting implicit definitions for the case of import sqlContext._ and nested implicits are not supported in scala.

From the migration guide:

Additionally, the implicit conversions now only augment RDDs that are composed of Products (i.e., case classes or tuples) with a method toDF, instead of applying automatically.

Upvotes: 6

Related Questions