AWS GlueStudio to Snowflake JDBC: An error occurred while calling pyWriteDynamicFrame. No suitable driver

Question

I'm trying to move data from a Data Catalog table (MySQL) through AWS Glue (in visual mode of GlueStudio) into a snowflake table.

For this, I'm following this guide Performing data transformations using Snowflake and AWS Glue [1]

I'm following every part of it, but when executing my job, I get the error

An error occurred while calling xxx.pyWriteDynamicFrame. No suitable driver

(yes, other than the stack trace, there's no more info in the error message) I've tested everything I could think of, like:

Having access to the s3 bucket of the driver
Having network access
I've Tried an incomplete JDBC ULR (no password) and the error says so
I've tried giving the wrong password, and I get the appropriate error

One thing I've found is that a lot of issues I've found are reported on AWS Glue as a script (not for the visual editor) and in many of them, they're using two jars: the snowflake JDBC driver and the Snowflake Spark Connector. Even though the tutorial I followed[1] isn't that clear, I tried having both files in my 'drivers' bucket, but still the same error.

I've tried many versions of both files to no avail.

So I have no idea what that might be (I even tried on a different AWS account and a different Snowflake account so I could have full access to resources)

Did any of you have tried this setup?

I'm using:

AWS GlueStudio (June 2021)
Snowflake cloud version 5.22.1 (got that with select CURRENT_VERSION(); in Snowflake)
Snowflake JDBC driver v3.13.4
Snowflake spark connector v2.9.0-spark_2.4 for scala v2.12
My connection string: jdbc:snowflake://xxx00000.snowflakecomputing.com/?user=${Username}&password=${Password}&warehouse=${wh}&db=${db}&schema=${schema}

[Edit Jun 23] As for @jonlegend comment: I'm using the visual editor for this job. So I'm not in control of the code implementation. Nevertheless, I'll post the code generated:

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job

## @params: [JOB_NAME]
args = getResolvedOptions(sys.argv, ['JOB_NAME'])

sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)
## @type: DataSource
## @args: [database = "mydb_name", table_name = "mytable_name", transformation_ctx = "DataSource0"]
## @return: DataSource0
## @inputs: []
DataSource0 = glueContext.create_dynamic_frame.from_catalog(database = "mydb_name", table_name = "mytable_name", transformation_ctx = "DataSource0")
## @type: ApplyMapping
## @args: [mappings = [("account_number", "string", "account_number", "string"), ("user_id", "int", "user_id", "int"), ("description", "string", "description", "string"), ("id", "int", "id", "int"), ("group_account_id", "int", "group_account_id", "int"), ("updated", "timestamp", "updated", "timestamp")], transformation_ctx = "Transform0"]
## @return: Transform0
## @inputs: [frame = DataSource0]
Transform0 = ApplyMapping.apply(frame = DataSource0, mappings = [("account_number", "string", "account_number", "string"), ("user_id", "int", "user_id", "int"), ("description", "string", "description", "string"), ("id", "int", "id", "int"), ("group_account_id", "int", "group_account_id", "int"), ("updated", "timestamp", "updated", "timestamp")], transformation_ctx = "Transform0")
## @type: DataSink
## @args: [connection_type = "custom.jdbc", connection_options = {"dbTable":"myschema.myTargetTable","connectionName":"snowflake-connection-v7"}, transformation_ctx = "DataSink0"]
## @return: DataSink0
## @inputs: [frame = Transform0]
DataSink0 = glueContext.write_dynamic_frame.from_options(frame = Transform0, connection_type = "custom.jdbc", connection_options = {"dbTable":"myschema.myTargetTable","connectionName":"snowflake-connection-v7"}, transformation_ctx = "DataSink0")
job.commit()

Also, about the stacktrace, I can share it too:

ERROR [main] glue.ProcessLauncher (Logging.scala:logError(70)): Error from Python:Traceback (most recent call last):
  File "/tmp/test_job_v7.py", line 30, in 
    DataSink0 = glueContext.write_dynamic_frame.from_options(frame = Transform0, connection_type = "custom.jdbc", connection_options = 
{
    "dbTable": "myschema.myTargetTable",
    "connectionName": "snowflake-connection-v7"
}
, transformation_ctx = "DataSink0")
  File "/opt/amazon/lib/python3.6/site-packages/awsglue/dynamicframe.py", line 653, in from_options
    format_options, transformation_ctx)
  File "/opt/amazon/lib/python3.6/site-packages/awsglue/context.py", line 281, in write_dynamic_frame_from_options
    format, format_options, transformation_ctx)
  File "/opt/amazon/lib/python3.6/site-packages/awsglue/context.py", line 304, in write_from_options
    return sink.write(frame_or_dfc)
  File "/opt/amazon/lib/python3.6/site-packages/awsglue/data_sink.py", line 35, in write
    return self.writeFrame(dynamic_frame_or_dfc, info)
  File "/opt/amazon/lib/python3.6/site-packages/awsglue/data_sink.py", line 31, in writeFrame
    return DynamicFrame(self._jsink.pyWriteDynamicFrame(dynamic_frame._jdf, callsite(), info), dynamic_frame.glue_ctx, dynamic_frame.name + "_errors")
  File "/opt/amazon/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/opt/amazon/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
    return f(*a, **kw)
  File "/opt/amazon/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
    format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o104.pyWriteDynamicFrame.
: java.sql.SQLException: No suitable driver
    at java.sql.DriverManager.getDriver(DriverManager.java:315)
    at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions$$anonfun$6.apply(JDBCOptions.scala:105)
    at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions$$anonfun$6.apply(JDBCOptions.scala:105)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions.(JDBCOptions.scala:104)
    at org.apache.spark.sql.jdbc.glue.GlueJDBCOptions.(GlueJDBCOptions.scala:14)
    at org.apache.spark.sql.jdbc.glue.GlueJDBCOptions.(GlueJDBCOptions.scala:17)
    at com.amazonaws.services.glue.marketplace.partner.PartnerJDBCRecordWriterFactory.(PartnerJDBCDataSink.scala:78)
    at com.amazonaws.services.glue.marketplace.partner.PartnerJDBCDataSink.createWriterFactory(PartnerJDBCDataSink.scala:32)
    at com.amazonaws.services.glue.marketplace.partner.PartnerJDBCDataSink.createWriterFactory(PartnerJDBCDataSink.scala:23)
    at com.amazonaws.services.glue.marketplace.connector.GlueCustomDataSink.defaultWriteDynamicFrame(CustomDataSink.scala:68)
    at com.amazonaws.services.glue.marketplace.connector.GlueCustomDataSink.writeDynamicFrame(CustomDataSink.scala:61)
    at com.amazonaws.services.glue.DataSink.pyWriteDynamicFrame(DataSink.scala:65)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.lang.Thread.run(Thread.java:748)

Finally, I'm looking at the error logs AND the Job logs, and the error is the same. Previous messages in both logs don't help either.

AWS GlueStudio to Snowflake JDBC: An error occurred while calling pyWriteDynamicFrame. No suitable driver

Answers (1)

Related Questions