tamjd1
tamjd1

Reputation: 966

Copy data from AWS S3 to Aurora Postgres

I'm trying to copy data from AWS S3 to Aurora Postgres. Currently my process is as follows:

  1. Download files from S3 locally (on EC2 instance)
  2. Run "COPY FROM STDIN ... " command to load data from input stream to Aurora postgres.

I'm wondering if there is a command to copy directly from S3 to Aurora postgres.

Upvotes: 10

Views: 6330

Answers (2)

rmcsharry
rmcsharry

Reputation: 5552

There is a guide on amazon itself that walks through this here

First, gather the details that you need to supply to the function. These include the name of the table on your Aurora PostgreSQL DB cluster's instance, and the bucket name, file path, file type, and AWS Region where the Amazon S3 data is stored. For more information, see View an object in the Amazon Simple Storage Service User Guide.

Note Multi part data import from Amazon S3 isn't currently supported.

Get the name of the table into which the aws_s3.table_import_from_s3 function is to import the data. As an example, the following command creates a table t1 that can be used in later steps.

postgres=> CREATE TABLE t1 
    (col1 varchar(80), 
    col2 varchar(80), 
    col3 varchar(80));

Get the details about the Amazon S3 bucket and the data to import. To do this, open the Amazon S3 console at https://console.aws.amazon.com/s3/, and choose Buckets. Find the bucket containing your data in the list. Choose the bucket, open its Object overview page, and then choose Properties.

Make a note of the bucket name, path, the AWS Region, and file type. You need the Amazon Resource Name (ARN) later, to set up access to Amazon S3 through an IAM role. For more more information, see Setting up access to an Amazon S3 bucket.

You can verify the path to the data on the Amazon S3 bucket by using the AWS CLI command aws s3 cp. If the information is correct, this command downloads a copy of the Amazon S3 file.

Set up permissions on your Aurora PostgreSQL DB cluster to allow access to the file on the Amazon S3 bucket. To do so, you use either an AWS Identity and Access Management (IAM) role or security credentials. For more information, see Setting up access to an Amazon S3 bucket.

Supply the path and other Amazon S3 object details gathered (see step 2) to the create_s3_uri function to construct an Amazon S3 URI object. To learn more about this function, see aws_commons.create_s3_uri. The following is an example of constructing this object during a psql session.

postgres=> SELECT aws_commons.create_s3_uri(
   'docs-lab-store-for-rpg',
   'versions_and_jdks_listing.csv',
   'us-west-1'
) AS s3_uri \gset

In the next step, you pass this object (aws_commons._s3_uri_1) to the aws_s3.table_import_from_s3 function to import the data to the table.

Invoke the aws_s3.table_import_from_s3 function to import the data from Amazon S3 into your table. For reference information, see aws_s3.table_import_from_s3. For examples, see Importing data from Amazon S3 to your Aurora PostgreSQL DB cluster.

Upvotes: 0

strongjz
strongjz

Reputation: 4491

This blog post from AWS mentions being able to do it with AWS Database Migration service

https://aws.amazon.com/blogs/database/stream-data-into-an-aurora-postgresql-database-using-aws-dms-and-amazon-kinesis-data-firehose/

AWS DMS can read data from source S3 buckets and load them into a target database. To do this, provide access to an S3 bucket containing one or more data files. In that S3 bucket, include a JSON file that describes the mapping between the data and the database tables of the data in those files.

The source data files must be in comma-separated value (CSV) format. Name the files using the naming convention shown following. In this convention, schemaName is the source schema and tableName is the name of a table within that schema.

Upvotes: 1

Related Questions