Georg Heiler
Georg Heiler

Reputation: 17676

Apache spark JDBC connection read write driver missing

Hi there are already many questions out there regarding this topic, the solution always was:

I set up a minimum example here: https://github.com/geoHeil/sparkJDBCHowTo, trying both methods but none worked for me. Getting java.sql.SQLException: No suitable driver

Upvotes: 0

Views: 574

Answers (2)

Naveen Kumar
Naveen Kumar

Reputation: 1

That’s pretty straightforward. To connect to external database to retrieve data into Spark dataframes, an additional jar file is required.

E.g. with MySQL the JDBC driver is required. Download the driver package and extract mysql-connector-java-x.yy.zz-bin.jar in a path that’s accessible from every node in the cluster. Preferably this is a path on shared file system. E.g. with Pouta Virtual Cluster such path would be under /shared_data, here I use /shared_data/thirdparty_jars/.

With direct Spark job submissions from terminal one can specify –driver-class-path argument pointing to extra jars that should be provided to workers with the job. However this does not work with this approach, so we must configure these paths for front end and worker nodes in the spark-defaults.conf file, usually in /opt/spark/conf directory.

Place Any jar that depends upon what server you using in:

spark.driver.extraClassPath /"your-path"/mysql-connector-java-5.1.35-bin.jar

spark.executor.extraClassPath /"your-path"/mysql-connector-java-5.1.35-bin.jar

Upvotes: 0

Georg Heiler
Georg Heiler

Reputation: 17676

Here is the fix: Apache Spark : JDBC connection not working adding prop.put("driver", "org.postgresql.Driver") works fine.

The strange thing is, the connection does not seem to be stable e.g. with the hive-context it only works 1 out of 2 times.

Upvotes: 1

Related Questions