Reputation: 4790
I have cloudera vm running spark version 1.6.0
I created a dataframe from a CSV file and now filtering columns based on some where clause
df = sqlContext.read.format('com.databricks.spark.csv').options(header='true').load('file:///home/cloudera/sample.csv')
df.registerTempTable("closedtrips")
result = sqlContext.sql("SELECT id,`safety rating` as safety_rating, route FROM closedtrips WHERE `trip frozen` == 'YES'")
However it gives me runtime error on the sql line.
py4j.protocol.Py4JJavaError: An error occurred while calling o21.sql.
: java.lang.RuntimeException: [1.96] failure: identifier expected
SELECT consigner,`safety rating` as safety_rating, route FROM closedtrips WHERE `trip frozen` == 'YES'
^
Where am I going wrong here?
The above command fails in vm command line, however works fine when ran on databricks environment
Also why are column names case sensitive in vm, it fails to recognise 'trip frozen' because the actual column is 'Trip Frozen'. All of this works fine in databricks and breaks in vm
Upvotes: 0
Views: 265
Reputation: 13821
In your VM, are you creating sqlContext
as a SQLContext
or as a HiveContext
?
In Databricks, the automatically-created sqlContext
will always point to a HiveContext
.
In Spark 2.0 this distinction between HiveContext
and regular SQLContext
should not matter because both have been subsumed by SparkSession
, but in Spark 1.6 the two types of contexts differ slightly in how they parse SQL language input.
Upvotes: 2