Reputation: 173
I'm reading a CSV file in Spark 2.0 and counting not null values in a column using the following:
val df = spark.read.option("header", "true").csv(dir)
df.filter("IncidntNum is not null").count()
and it works fine when I test it using spark-shell. When I create a jar file containing the code and submit it to spark-submit, I get an exception at second line above :
Exception in thread "main" org.apache.spark.sql.catalyst.parser.ParseException:
extraneous input '' expecting {'(', 'SELECT', ..
== SQL ==
IncidntNum is not null
^^^
at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:197)
Any idea why this would happen when I'm using the code that works in spark-shell?
Upvotes: 2
Views: 4134
Reputation: 2111
This question has been sitting around a while, but better late than never.
The most likely reason I can think of is that when running using spark-submit you are running in "cluster" mode. That means the driver process will be located on a different machine than when you run spark-shell. That could cause Spark to read a different file.
Upvotes: 1