Piyush Patil
Piyush Patil

Reputation: 14523

Exporting a AWS Postgres RDS Table to AWS S3

I wanted to use AWS Data Pipeline to pipe data from a Postgres RDS to AWS S3. Does anybody know how this is done?

More precisely, I wanted to export a Postgres Table to AWS S3 using data Pipeline. The reason I am using Data Pipeline is I want to automate this process and this export is going to run once every week.

Any other suggestions will also work.

Upvotes: 11

Views: 17761

Answers (4)

user433342
user433342

Reputation: 1036

You can now do this with aws_s3.query_export_to_s3 command within postgres itself https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/postgresql-s3-export.html

Upvotes: 1

Dina Kaiser
Dina Kaiser

Reputation: 476

You can define a copy-activity in the Data Pipeline interface to extract data from a Postgres RDS instance into S3.

  1. Create a data node of the type SqlDataNode. Specify table name and select query.
  2. Setup the database connection by specifying RDS instance ID (the instance ID is in your URL, e.g. your-instance-id.xxxxx.eu-west-1.rds.amazonaws.com) along with username, password and database name.
  3. Create a data node of the type S3DataNode.
  4. Create a Copy activity and set the SqlDataNode as input and the S3DataNode as output.

Another option is to use an external tool like Alooma. Alooma can replicate tables from PostgreSQL database hosted Amazon RDS to Amazon S3 (https://www.alooma.com/integrations/postgresql/s3). The process can be automated and you can run it once a week.

Upvotes: 4

Jonathan P. Diaz
Jonathan P. Diaz

Reputation: 3274

I built a Pipeline from scratch using the MySQL and the documentation as reference.

You need to have the roles on place, DataPipelineDefaultResourceRole && DataPipelineDefaultRole.

I haven't load the parameters, so, you need to get into the architech and put your credentials and folders.

Hope it helps.

{
  "objects": [
    {
      "failureAndRerunMode": "CASCADE",
      "resourceRole": "DataPipelineDefaultResourceRole",
      "role": "DataPipelineDefaultRole",
      "pipelineLogUri": "#{myS3LogsPath}",
      "scheduleType": "ONDEMAND",
      "name": "Default",
      "id": "Default"
    },
    {
      "database": {
        "ref": "DatabaseId_WC2j5"
      },
      "name": "DefaultSqlDataNode1",
      "id": "SqlDataNodeId_VevnE",
      "type": "SqlDataNode",
      "selectQuery": "#{myRDSSelectQuery}",
      "table": "#{myRDSTable}"
    },
    {
      "*password": "#{*myRDSPassword}",
      "name": "RDS_database",
      "id": "DatabaseId_WC2j5",
      "type": "RdsDatabase",
      "rdsInstanceId": "#{myRDSId}",
      "username": "#{myRDSUsername}"
    },
    {
      "output": {
        "ref": "S3DataNodeId_iYhHx"
      },
      "input": {
        "ref": "SqlDataNodeId_VevnE"
      },
      "name": "DefaultCopyActivity1",
      "runsOn": {
        "ref": "ResourceId_G9GWz"
      },
      "id": "CopyActivityId_CapKO",
      "type": "CopyActivity"
    },
    {
      "dependsOn": {
        "ref": "CopyActivityId_CapKO"
      },
      "filePath": "#{myS3Container}#{format(@scheduledStartTime, 'YYYY-MM-dd-HH-mm-ss')}",
      "name": "DefaultS3DataNode1",
      "id": "S3DataNodeId_iYhHx",
      "type": "S3DataNode"
    },
    {
      "resourceRole": "DataPipelineDefaultResourceRole",
      "role": "DataPipelineDefaultRole",
      "instanceType": "m1.medium",
      "name": "DefaultResource1",
      "id": "ResourceId_G9GWz",
      "type": "Ec2Resource",
      "terminateAfter": "30 Minutes"
    }
  ],
  "parameters": [
  ]
}

Upvotes: 2

Related Questions