Reputation: 14523
I wanted to use AWS Data Pipeline to pipe data from a Postgres RDS to AWS S3. Does anybody know how this is done?
More precisely, I wanted to export a Postgres Table to AWS S3 using data Pipeline. The reason I am using Data Pipeline is I want to automate this process and this export is going to run once every week.
Any other suggestions will also work.
Upvotes: 11
Views: 17761
Reputation: 1036
You can now do this with aws_s3.query_export_to_s3 command within postgres itself https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/postgresql-s3-export.html
Upvotes: 1
Reputation: 476
You can define a copy-activity in the Data Pipeline interface to extract data from a Postgres RDS instance into S3.
Another option is to use an external tool like Alooma. Alooma can replicate tables from PostgreSQL database hosted Amazon RDS to Amazon S3 (https://www.alooma.com/integrations/postgresql/s3). The process can be automated and you can run it once a week.
Upvotes: 4
Reputation: 3274
I built a Pipeline from scratch using the MySQL and the documentation as reference.
You need to have the roles on place, DataPipelineDefaultResourceRole && DataPipelineDefaultRole.
I haven't load the parameters, so, you need to get into the architech and put your credentials and folders.
Hope it helps.
{
"objects": [
{
"failureAndRerunMode": "CASCADE",
"resourceRole": "DataPipelineDefaultResourceRole",
"role": "DataPipelineDefaultRole",
"pipelineLogUri": "#{myS3LogsPath}",
"scheduleType": "ONDEMAND",
"name": "Default",
"id": "Default"
},
{
"database": {
"ref": "DatabaseId_WC2j5"
},
"name": "DefaultSqlDataNode1",
"id": "SqlDataNodeId_VevnE",
"type": "SqlDataNode",
"selectQuery": "#{myRDSSelectQuery}",
"table": "#{myRDSTable}"
},
{
"*password": "#{*myRDSPassword}",
"name": "RDS_database",
"id": "DatabaseId_WC2j5",
"type": "RdsDatabase",
"rdsInstanceId": "#{myRDSId}",
"username": "#{myRDSUsername}"
},
{
"output": {
"ref": "S3DataNodeId_iYhHx"
},
"input": {
"ref": "SqlDataNodeId_VevnE"
},
"name": "DefaultCopyActivity1",
"runsOn": {
"ref": "ResourceId_G9GWz"
},
"id": "CopyActivityId_CapKO",
"type": "CopyActivity"
},
{
"dependsOn": {
"ref": "CopyActivityId_CapKO"
},
"filePath": "#{myS3Container}#{format(@scheduledStartTime, 'YYYY-MM-dd-HH-mm-ss')}",
"name": "DefaultS3DataNode1",
"id": "S3DataNodeId_iYhHx",
"type": "S3DataNode"
},
{
"resourceRole": "DataPipelineDefaultResourceRole",
"role": "DataPipelineDefaultRole",
"instanceType": "m1.medium",
"name": "DefaultResource1",
"id": "ResourceId_G9GWz",
"type": "Ec2Resource",
"terminateAfter": "30 Minutes"
}
],
"parameters": [
]
}
Upvotes: 2
Reputation: 118
There is a sample on github. https://github.com/awslabs/data-pipeline-samples/tree/master/samples/RDStoS3
Here is the code: https://github.com/awslabs/data-pipeline-samples/blob/master/samples/RDStoS3/RDStoS3Pipeline.json
Upvotes: 5