user2535982
user2535982

Reputation:

Spark cannot find the postgres jdbc driver

EDIT: See the edit at the end

First of all, I am using Spark 1.5.2 on Amazon EMR and using Amazon RDS for my postgres database. Second is that I am a complete newbie in this world of Spark and Hadoop and MapReduce.

Essentially my problem is the same as for this guy: java.sql.SQLException: No suitable driver found when loading DataFrame into Spark SQL

So the dataframe is loaded, but when I try to evaluate it (doing df.show(), where df is the dataframe) gives me the error:

java.sql.SQLException: No suitable driver found for jdbc:postgresql://mypostgres.cvglvlp29krt.eu-west-1.rds.amazonaws.com:5432/mydb

I should note that I start spark like this:

spark-shell --driver-class-path /home/hadoop/postgresql-9.4.1207.jre7.jar

The solutions suggest delivering the jar onto the worker nodes and setting the classpath on them somehow, which I don't really understand how to do. But then they say that apparently the issue was fixed in Spark 1.4, and I'm using 1.5.2, and still having this issue, so what is going on?

EDIT: Looks like I resolved the issue, however I still don't quite understand why this works and the thing above doesn't, so I guess my questions is now why does doing this:

spark-shell --driver-class-path /home/hadoop/postgresql-9.4.1207.jre7.jar --conf spark.driver.extraClassPath=/home/hadoop/postgresql-9.4.1207.jre7.jar --jars /home/hadoop/postgresql-9.4.1207.jre7.jar

solve the problem? I just added the path as a parameter into some more of the flags it seems.

Upvotes: 1

Views: 6650

Answers (1)

zero323
zero323

Reputation: 330063

spark-shell --driver-class-path .... --jars ... works because all jar files listed in --jars are automatically distributed over the cluster.

Alternatively you could use

spark-shell --packages  org.postgresql:postgresql:9.4.1207.jre7

and specify driver class as an option for DataFrameReader / DataFrameWriter

val df = sqlContext.read.format("jdbc").options(Map(
  "url" -> url, "dbtable" -> table, "driver" -> "org.postgresql.Driver"
)).load()

or even manually copy required jars to the workers and place these somewhere on the CLASSPATH.

Upvotes: 5

Related Questions