Error inside where clause while comparing items in Spark SQL

Question

I have cloudera vm running spark version 1.6.0

I created a dataframe from a CSV file and now filtering columns based on some where clause

df = sqlContext.read.format('com.databricks.spark.csv').options(header='true').load('file:///home/cloudera/sample.csv')
df.registerTempTable("closedtrips")

result = sqlContext.sql("SELECT id,`safety rating` as safety_rating, route FROM closedtrips WHERE `trip frozen` == 'YES'")

However it gives me runtime error on the sql line.

py4j.protocol.Py4JJavaError: An error occurred while calling o21.sql.
: java.lang.RuntimeException: [1.96] failure: identifier expected

SELECT consigner,`safety rating` as safety_rating, route FROM closedtrips WHERE `trip frozen` == 'YES'
                                                                                               ^

Where am I going wrong here?

The above command fails in vm command line, however works fine when ran on databricks environment

Also why are column names case sensitive in vm, it fails to recognise 'trip frozen' because the actual column is 'Trip Frozen'. All of this works fine in databricks and breaks in vm

Josh Rosen · Accepted Answer

In your VM, are you creating sqlContext as a SQLContext or as a HiveContext?

In Databricks, the automatically-created sqlContext will always point to a HiveContext.

In Spark 2.0 this distinction between HiveContext and regular SQLContext should not matter because both have been subsumed by SparkSession, but in Spark 1.6 the two types of contexts differ slightly in how they parse SQL language input.

Error inside where clause while comparing items in Spark SQL

Answers (1)

Related Questions