Reputation: 53
I have the CDH5 version 1.0.0 of Spark installed on CentOS 6.2 and running without error.
When trying to run some Spark SQL I encounter an error. I start my Spark shell fine ...
spark-shell --master spark://mysparkserver:7077
then I run one of the example Scala scripts from the programming guide at Spark SQL Programming Guide.
scala> val sqlContext = new org.apache.spark.sql.SQLContext(sc)
scala> val vehicle = sc.textFile("/tmp/scala.csv")
scala> val schemaString = "year manufacturer model class engine cylinders fuel consumption clkm hlkm cmpg hmpg co2lyr co2gkm"
scala> import org.apache.spark.sql._
scala > val schema =
StructType
(
schemaString.split(" ").map(fieldName =>
StructField(fieldName, StringType, true))
)
But the import statement doesn't seem to have worked? Because the last line gives an error that
scala> StructType
<console>:14: error: not found: value StructType
StructType
^
I do know that StructType
is org.apache.spark.sql.api.java.StructType
. And if I replace StructType
in the schema line with the full name the error changes.
Has anyone else encountered this error ? Is there an extra step required that I am missing?
Upvotes: 3
Views: 6720
Reputation: 21
I have encountered this issue even in spark 3.0.0
Please use the below import
scala> import org.apache.spark.sql.types._
import org.apache.spark.sql.types._
scala> val schema = StructType( Array(StructField("language",
StringType,true),StructField("language", StringType,true)))
schema: org.apache.spark.sql.types.StructType =
StructType(StructField(language,StringType,true),
StructField(language,StringType,true))
Upvotes: 1
Reputation: 3571
Your problem is that you are reading the programming guide for the latest version of Spark, and trying it out on Spark 1.0.0. Alas, org.apache.spark.sql.api.java.StructType
was introduced in Spark 1.1.0, as was the section on "Programmatically Specifying the Schema".
So, without upgrading, you're not going to be able to do this -- unless you're able to make use of the techniques in the 1.0.0 guide section "Running SQL on RDDs", which in 1.1.0 is called "Inferring the Schema Using Reflection". (Basically, if you can tolerate a fixed Schema.)
If you look at the various documentation URLs, you want to replace the latest
with 1.0.0
. When in doubt, I like to bring up multiple versions of the API doc and search. I notice that, like javadoc, scaladoc has a @since
annotation for making this information clearer in API docs, but it isn't being used in the Spark API docs.
Upvotes: 3