Reputation: 2853
I have some data stored in S3 . I need to clone/copy this data periodically from S3 to Redshift cluster. To do bulk copy , I can use copy command to copy from S3 to redshift.
Similarly is there any trivial way to copy data from S3 to Redshift periodically .
Thanks
Upvotes: 1
Views: 4847
Reputation: 31
You can use copy command with lambda. You can configure 2 lambdas. One will create a manifest file for you upcoming new data and another will read from that manifest for load it on redshift with Redshift data api.
Upvotes: 0
Reputation: 163
Kinesis option works only if redshift is publicly accessible.
Upvotes: 0
Reputation: 91
AWS Lambda Redshift Loader
is a good solution that runs a COPY command on Redshift whenever a new file appears pre-configured location on Amazon S3.
Links:
https://aws.amazon.com/blogs/big-data/a-zero-administration-amazon-redshift-database-loader/ https://github.com/awslabs/aws-lambda-redshift-loader
Upvotes: 2
Reputation: 2729
I believe Kinesis Firehose is the simplest way to get this done. Simply create a Kinesis Forehose stream, point it a a specific table in your Redshift cluster, write data to the stream, done :)
Full setup procedure here: https://docs.aws.amazon.com/ses/latest/DeveloperGuide/event-publishing-redshift-firehose-stream.html
Upvotes: 1
Reputation: 6333
Try using AWS Data Pipeline which has various templates for moving data from one AWS service to other. The "Load data from S3 into Redshift" template copies data from an Amazon S3 folder into a Redshift table. You can load the data into an existing table or provide a SQL query to create the table. The Redshift table must have the same schema as the data in Amazon S3.
Data Pipeline supports pipelines to be running on a schedule. You have a cron style editor for scheduling
Upvotes: 2