Ankur Garg
Ankur Garg

Reputation: 2853

Stream data from S3 bucket to redshift periodically

I have some data stored in S3 . I need to clone/copy this data periodically from S3 to Redshift cluster. To do bulk copy , I can use copy command to copy from S3 to redshift.

Similarly is there any trivial way to copy data from S3 to Redshift periodically .

Thanks

Upvotes: 1

Views: 4847

Answers (5)

Xoxo
Xoxo

Reputation: 31

You can use copy command with lambda. You can configure 2 lambdas. One will create a manifest file for you upcoming new data and another will read from that manifest for load it on redshift with Redshift data api.

Upvotes: 0

Kinesis option works only if redshift is publicly accessible.

Upvotes: 0

Arnab Sarkar
Arnab Sarkar

Reputation: 91

AWS Lambda Redshift Loader is a good solution that runs a COPY command on Redshift whenever a new file appears pre-configured location on Amazon S3.

Links:

https://aws.amazon.com/blogs/big-data/a-zero-administration-amazon-redshift-database-loader/ https://github.com/awslabs/aws-lambda-redshift-loader

Upvotes: 2

Julien Simon
Julien Simon

Reputation: 2729

I believe Kinesis Firehose is the simplest way to get this done. Simply create a Kinesis Forehose stream, point it a a specific table in your Redshift cluster, write data to the stream, done :)

Full setup procedure here: https://docs.aws.amazon.com/ses/latest/DeveloperGuide/event-publishing-redshift-firehose-stream.html

Upvotes: 1

omuthu
omuthu

Reputation: 6333

Try using AWS Data Pipeline which has various templates for moving data from one AWS service to other. The "Load data from S3 into Redshift" template copies data from an Amazon S3 folder into a Redshift table. You can load the data into an existing table or provide a SQL query to create the table. The Redshift table must have the same schema as the data in Amazon S3.

Data Pipeline supports pipelines to be running on a schedule. You have a cron style editor for scheduling

Upvotes: 2

Related Questions