Reputation: 4582
I want to connect to Presto server using JDBC in PySpark. I followed a tutorial which is written in Java. I am trying to do the same in my Python3 code but getting an error:
: java.sql.SQLException: No suitable driver
I have tried to execute the following code:
jdbcDF = spark.read \
.format("jdbc") \
.option("url", "jdbc:presto://my_machine_ip:8080/hive/default") \
.option("user", "airflow") \
.option("dbtable", "may30_1") \
.load()
It should be noted that I am using Spark on EMR and so, spark
is already provided to me.
The code from the aforementioned tutorial is:
final String JDBC_DRIVER = "com.facebook.presto.jdbc.PrestoDriver";
final String DB_URL = "jdbc:presto://localhost:9000/catalogName/schemaName";
// Database credentials
final String USER = "username";
final String PASS = "password";
Connection conn = null;
Statement stmt = null;
try {
//Register JDBC driver
Class.forName(JDBC_DRIVER);
Kindly notice the JDBC_DRIVER
in the above code, I have not been able to deduce the corresponding assignment in Python3 i.e. in PySpark.
Nor have I added any dependency in any configuration whatsoever.
I expect to successfully connect to my presto. Right now, I am getting the following error, complete stacktrace:
Traceback (most recent call last):
File "<stdin>", line 5, in <module>
File "/usr/lib/spark/python/pyspark/sql/readwriter.py", line 172, in load
return self._df(self._jreader.load())
File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
File "/usr/lib/spark/python/pyspark/sql/utils.py", line 63, in deco
return f(*a, **kw)
File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o90.load.
: java.sql.SQLException: No suitable driver
at java.sql.DriverManager.getDriver(DriverManager.java:315)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions$$anonfun$6.apply(JDBCOptions.scala:105)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions$$anonfun$6.apply(JDBCOptions.scala:105)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions.<init>(JDBCOptions.scala:104)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions.<init>(JDBCOptions.scala:35)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:32)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:318)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
Upvotes: 2
Views: 11590
Reputation: 9921
You need to perform the following steps for connecting presto to pyspark.
Download Presto JDBC driver jar from official website and put it into spark master node.
Start pyspark shell with following params:
bin/pyspark --driver-class-path com.facebook.presto.jdbc.PrestoDriver --jars /path/to/presto/jdbc/driver/jar/file
jdbcDF = spark.read \
.format("jdbc") \
.option("driver", "com.facebook.presto.jdbc.PrestoDriver") \ <-- presto driver class
.option("url", "jdbc:presto://<machine_ip>:8080/hive/default") \
.option("user", "airflow") \
.option("dbtable", "may30_1") \
.load()
Upvotes: 4