Reputation: 17676
Hi there are already many questions out there regarding this topic, the solution always was:
I set up a minimum example here: https://github.com/geoHeil/sparkJDBCHowTo, trying both methods but none worked for me. Getting java.sql.SQLException: No suitable driver
Upvotes: 0
Views: 574
Reputation: 1
That’s pretty straightforward. To connect to external database to retrieve data into Spark dataframes, an additional jar
file is required.
E.g. with MySQL the JDBC driver is required. Download the driver package and extract mysql-connector-java-x.yy.zz-bin.jar
in a path that’s accessible from every node in the cluster. Preferably this is a path on shared file system.
E.g. with Pouta Virtual Cluster such path would be under /shared_data
, here I use /shared_data/thirdparty_jars/
.
With direct Spark job submissions from terminal one can specify –driver-class-path
argument pointing to extra jars that should be provided to workers with the job. However this does not work with this approach, so we must configure these paths for front end and worker nodes in the spark-defaults.conf
file, usually in /opt/spark/conf
directory.
Place Any jar
that depends upon what server you using in:
spark.driver.extraClassPath /"your-path"/mysql-connector-java-5.1.35-bin.jar
spark.executor.extraClassPath /"your-path"/mysql-connector-java-5.1.35-bin.jar
Upvotes: 0
Reputation: 17676
Here is the fix:
Apache Spark : JDBC connection not working
adding prop.put("driver", "org.postgresql.Driver")
works fine.
The strange thing is, the connection does not seem to be stable e.g. with the hive-context it only works 1 out of 2 times.
Upvotes: 1