Reputation: 123
I need to ingest data from an existing database locate in our own network to redshift using aws glue, i can connect it from an EC2 instance, but no idea how to connect it from aws glue。 Would someone give me any advice? I think it would be something magic in VPC setting, but no hints after seraching google.
Upvotes: 2
Views: 3527
Reputation: 1202
#upload jdbc driver to s3 location, in my case "s3://manikantabucket/com.mysql.jdbc_5.1.5.jar" add driver jar file in glue properties bar
##########code import sys
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.transforms import *
glueContext = GlueContext(SparkContext.getOrCreate())
connection_mysql8_options_source_emp = { "url": "jdbc:mysql://ec2-2-138-264-235.ap-east-1.compute.amazonaws.com:3306/mysql", "dbtable": "db", "user": "root", "password": "root", "customJdbcDriverS3Path": "s3://manikantabucket/com.mysql.jdbc_5.1.5.jar", "customJdbcDriverClassName": "com.mysql.jdbc.Driver"}
df_emp=df_emp.coalesce(1)
df_emp = glueContext.create_dynamic_frame.from_options(connection_type="mysql", connection_options=connection_mysql8_options_source_emp) S3bucket_node3 = glueContext.write_dynamic_frame.from_options( frame=df_emp, connection_type="s3", format="csv", connection_options={"path": "s3://manikantabucketo/test/", "partitionKeys": []}, transformation_ctx="S3bucket_node3", )
Upvotes: 0
Reputation: 238837
Would someone give me any advice?
AWS wrote dedicated articles about this topic:
How to access and analyze on-premises data stores using AWS Glue
Use AWS Glue to run ETL jobs against non-native JDBC data sources
These would be a good start.
Upvotes: 1