Daniel Salcedo
Daniel Salcedo

Reputation: 332

AWS GlueStudio to Snowflake JDBC: An error occurred while calling pyWriteDynamicFrame. No suitable driver

I'm trying to move data from a Data Catalog table (MySQL) through AWS Glue (in visual mode of GlueStudio) into a snowflake table.

For this, I'm following this guide Performing data transformations using Snowflake and AWS Glue [1]

I'm following every part of it, but when executing my job, I get the error

An error occurred while calling xxx.pyWriteDynamicFrame. No suitable driver

(yes, other than the stack trace, there's no more info in the error message) I've tested everything I could think of, like:

One thing I've found is that a lot of issues I've found are reported on AWS Glue as a script (not for the visual editor) and in many of them, they're using two jars: the snowflake JDBC driver and the Snowflake Spark Connector. Even though the tutorial I followed[1] isn't that clear, I tried having both files in my 'drivers' bucket, but still the same error.

I've tried many versions of both files to no avail.

So I have no idea what that might be (I even tried on a different AWS account and a different Snowflake account so I could have full access to resources)

Did any of you have tried this setup?

I'm using:


[Edit Jun 23] As for @jonlegend comment: I'm using the visual editor for this job. So I'm not in control of the code implementation. Nevertheless, I'll post the code generated:

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job

## @params: [JOB_NAME]
args = getResolvedOptions(sys.argv, ['JOB_NAME'])

sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)
## @type: DataSource
## @args: [database = "mydb_name", table_name = "mytable_name", transformation_ctx = "DataSource0"]
## @return: DataSource0
## @inputs: []
DataSource0 = glueContext.create_dynamic_frame.from_catalog(database = "mydb_name", table_name = "mytable_name", transformation_ctx = "DataSource0")
## @type: ApplyMapping
## @args: [mappings = [("account_number", "string", "account_number", "string"), ("user_id", "int", "user_id", "int"), ("description", "string", "description", "string"), ("id", "int", "id", "int"), ("group_account_id", "int", "group_account_id", "int"), ("updated", "timestamp", "updated", "timestamp")], transformation_ctx = "Transform0"]
## @return: Transform0
## @inputs: [frame = DataSource0]
Transform0 = ApplyMapping.apply(frame = DataSource0, mappings = [("account_number", "string", "account_number", "string"), ("user_id", "int", "user_id", "int"), ("description", "string", "description", "string"), ("id", "int", "id", "int"), ("group_account_id", "int", "group_account_id", "int"), ("updated", "timestamp", "updated", "timestamp")], transformation_ctx = "Transform0")
## @type: DataSink
## @args: [connection_type = "custom.jdbc", connection_options = {"dbTable":"myschema.myTargetTable","connectionName":"snowflake-connection-v7"}, transformation_ctx = "DataSink0"]
## @return: DataSink0
## @inputs: [frame = Transform0]
DataSink0 = glueContext.write_dynamic_frame.from_options(frame = Transform0, connection_type = "custom.jdbc", connection_options = {"dbTable":"myschema.myTargetTable","connectionName":"snowflake-connection-v7"}, transformation_ctx = "DataSink0")
job.commit()

Also, about the stacktrace, I can share it too:

ERROR [main] glue.ProcessLauncher (Logging.scala:logError(70)): Error from Python:Traceback (most recent call last):
  File "/tmp/test_job_v7.py", line 30, in <module>
    DataSink0 = glueContext.write_dynamic_frame.from_options(frame = Transform0, connection_type = "custom.jdbc", connection_options = 
{
    "dbTable": "myschema.myTargetTable",
    "connectionName": "snowflake-connection-v7"
}
, transformation_ctx = "DataSink0")
  File "/opt/amazon/lib/python3.6/site-packages/awsglue/dynamicframe.py", line 653, in from_options
    format_options, transformation_ctx)
  File "/opt/amazon/lib/python3.6/site-packages/awsglue/context.py", line 281, in write_dynamic_frame_from_options
    format, format_options, transformation_ctx)
  File "/opt/amazon/lib/python3.6/site-packages/awsglue/context.py", line 304, in write_from_options
    return sink.write(frame_or_dfc)
  File "/opt/amazon/lib/python3.6/site-packages/awsglue/data_sink.py", line 35, in write
    return self.writeFrame(dynamic_frame_or_dfc, info)
  File "/opt/amazon/lib/python3.6/site-packages/awsglue/data_sink.py", line 31, in writeFrame
    return DynamicFrame(self._jsink.pyWriteDynamicFrame(dynamic_frame._jdf, callsite(), info), dynamic_frame.glue_ctx, dynamic_frame.name + "_errors")
  File "/opt/amazon/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/opt/amazon/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
    return f(*a, **kw)
  File "/opt/amazon/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
    format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o104.pyWriteDynamicFrame.
: java.sql.SQLException: No suitable driver
    at java.sql.DriverManager.getDriver(DriverManager.java:315)
    at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions$$anonfun$6.apply(JDBCOptions.scala:105)
    at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions$$anonfun$6.apply(JDBCOptions.scala:105)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions.<init>(JDBCOptions.scala:104)
    at org.apache.spark.sql.jdbc.glue.GlueJDBCOptions.<init>(GlueJDBCOptions.scala:14)
    at org.apache.spark.sql.jdbc.glue.GlueJDBCOptions.<init>(GlueJDBCOptions.scala:17)
    at com.amazonaws.services.glue.marketplace.partner.PartnerJDBCRecordWriterFactory.<init>(PartnerJDBCDataSink.scala:78)
    at com.amazonaws.services.glue.marketplace.partner.PartnerJDBCDataSink.createWriterFactory(PartnerJDBCDataSink.scala:32)
    at com.amazonaws.services.glue.marketplace.partner.PartnerJDBCDataSink.createWriterFactory(PartnerJDBCDataSink.scala:23)
    at com.amazonaws.services.glue.marketplace.connector.GlueCustomDataSink.defaultWriteDynamicFrame(CustomDataSink.scala:68)
    at com.amazonaws.services.glue.marketplace.connector.GlueCustomDataSink.writeDynamicFrame(CustomDataSink.scala:61)
    at com.amazonaws.services.glue.DataSink.pyWriteDynamicFrame(DataSink.scala:65)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.lang.Thread.run(Thread.java:748)

Finally, I'm looking at the error logs AND the Job logs, and the error is the same. Previous messages in both logs don't help either.

Upvotes: 4

Views: 2898

Answers (1)

Jon Legendre
Jon Legendre

Reputation: 370

A few suggestions:
Ensure your JDBC driver is referenced in your code (I am not sure how to do this in the visual editor, but in the code, change the following line:

DataSink0 = glueContext.write_dynamic_frame.from_options(frame = Transform0, connection_type = "custom.jdbc", connection_options = {"dbTable":"myschema.myTargetTable","connectionName":"snowflake-connection-v7"}, transformation_ctx = "DataSink0")

to:

DataSink0 = glueContext.write_dynamic_frame.from_options(frame = Transform0, connection_type = "custom.jdbc", connection_options = { 
  "dbTable":"myschema.myTargetTable",
  "connectionName":"snowflake-connection-v7",
  "customJdbcDriverS3Path": "Amazon S3 path of the custom JDBC driver",
  "customJdbcDriverClassName":"class name of the driver"
}, transformation_ctx = "DataSink0")

Also ensure that your glue job has iam permissions to the S3 folder where the driver is located. If you are using the default service-role/AWSGlueServiceRole, just ensure that the string "aws-glue-" appears somewhere in the s3 path, e.g. "S3://somebucket/aws-glue-drivers/mydriver.jar"

Upvotes: 0

Related Questions