Hardik Gupta
Hardik Gupta

Reputation: 4790

Error inside where clause while comparing items in Spark SQL

I have cloudera vm running spark version 1.6.0

I created a dataframe from a CSV file and now filtering columns based on some where clause

df = sqlContext.read.format('com.databricks.spark.csv').options(header='true').load('file:///home/cloudera/sample.csv')
df.registerTempTable("closedtrips")

result = sqlContext.sql("SELECT id,`safety rating` as safety_rating, route FROM closedtrips WHERE `trip frozen` == 'YES'")

However it gives me runtime error on the sql line.

py4j.protocol.Py4JJavaError: An error occurred while calling o21.sql.
: java.lang.RuntimeException: [1.96] failure: identifier expected

SELECT consigner,`safety rating` as safety_rating, route FROM closedtrips WHERE `trip frozen` == 'YES'
                                                                                               ^

Where am I going wrong here?

The above command fails in vm command line, however works fine when ran on databricks environment

Also why are column names case sensitive in vm, it fails to recognise 'trip frozen' because the actual column is 'Trip Frozen'. All of this works fine in databricks and breaks in vm

Upvotes: 0

Views: 265

Answers (1)

Josh Rosen
Josh Rosen

Reputation: 13821

In your VM, are you creating sqlContext as a SQLContext or as a HiveContext?

In Databricks, the automatically-created sqlContext will always point to a HiveContext.

In Spark 2.0 this distinction between HiveContext and regular SQLContext should not matter because both have been subsumed by SparkSession, but in Spark 1.6 the two types of contexts differ slightly in how they parse SQL language input.

Upvotes: 2

Related Questions