Reputation: 10501
Per these AWS Amazon RDS docs, it looks like AWS offers an aws_s3
PostgreSQL extension for transferring data from S3 to Postgres in RDS.
We're using airflow to orchestrate our data ingestion pipelines, and it would be great if there was a python solution here. I have little experience with PostgreSQL and I've never used any PostgreSQL extensions, and being able to move data around using python is going to help us a ton. For the time being, we are avoiding AWS tools such as AWS Data Pipeline
and AWS Glue
in favor of building our own architecture with python and airflow.
For reference, we have the following for our GCP architecture for ingesting data from GCS into BigQuery using python:
from google.cloud import bigquery
# create BiqQuery client object + load job config
client = bigquery.Client()
job_config = bigquery.LoadJobConfig(
schema=None, # autodetech for now
source_format=bigquery.SourceFormat.NEWLINE_DELIMITED_JSON, # use ndjson
write_disposition=bigquery.WriteDisposition.WRITE_APPEND, # append to existing
autodetect=True
)
# and load into Bigquery
table_id = "our_gcp_project.our_model.our_table"
gcs_uri = "gs://our_bucket/path-to-our/file.json"
load_job = client.load_table_from_uri(gcs_uri, table_id, job_config=job_config) # location="US" # Make an API request.
load_job.result() # Waits for the job to complete
# check for success
destination_table = client.get_table(table_id)
print("Loaded {} rows.".format(destination_table.num_rows))
We're pretty much looking to port this code from GCS/BigQuery into S3/Postgres RDS, and want to get started in the right direction.
Upvotes: 2
Views: 1371
Reputation: 795
You have the option in PostgreSQL to invoke Lambda functions.
The Lambda Runtime can be set to use Python and you can use the Boto3 library to access the AWS services (Like S3) from the Lambda.
Be aware of the limitations of Lambda like the maximum 15 minute run time and payload sizes.
Also when creating a Lambda that needs access to the DB you will need to create a layer that contains the drivers that you can assign to your Lambda.
Upvotes: 1