Eshank Jain
Eshank Jain

Reputation: 179

Pipeline from AWS RDS to S3 using Glue

I was trying AWS Glue to migrate our current data pipeline from python scripts to AWS Glue . I was able to setup a crawler to pull the schema for the different postgres databases . However, I am facing issues in pulling data from Postgres RDS to S3 tables in Athena .

Thanks in advance !

Upvotes: 6

Views: 7080

Answers (1)

Tanveer Uddin
Tanveer Uddin

Reputation: 1525

You can't pull data from AWS RDS to S3 using Athena. Athena is a query engine over S3 data. To be able to extract data from RDS to S3, you can run a Glue job to read from a particular RDS table and create S3 dump in parquet format which will create another external table pointing to S3 data. Then you can query that S3 data using Athena. A sample code snippet to read from RDS using Glue catalog and write parquet in S3 will look like below. There are some Glue predefined template which you can use to experiment. Start with a small table first. Please let me know if it worked out for you or any further questions/issues.

datasource0 = glueContext.create_dynamic_frame.from_options(connection_type="postgresql", connection_options = 
{"url": "jdbc-url/database",
"user": "user_name",
"password": "password",
"dbtable": "table_name"},
transformation_ctx = "datasource0")

   datasink4 = glueContext.write_dynamic_frame.from_options(frame = datasource0, connection_type = "s3", connection_options = {"path": "s3://aws-glue-tpcds-parquet/"+ tableName + "/"}, format = "parquet", transformation_ctx = "datasink4")

Upvotes: 9

Related Questions