What is the best way to copy large csv files from s3 to redshift?

Question

I'm working on a task of copying csv files from s3 bucket to redshift. I've found multiple ways to do so but I'm not sure which one will be the best possible way to do it. Here's the scenario:

On regular intervals, multiple CSV files of size around 500 MB - 1 GB, will be added to my s3 bucket. The data can contain duplicates. The task is to copy the data to redshift table while ensuring that the duplicate data is not present in redshift.

Here are the ways I found which can be used:

Create a AWS Lambda function which will be triggered whenever a file is added to s3 bucket.
Use AWS Kinesis
Use AWS Glue

I understand Lambda should not be used for jobs that takes more than 5 minutes. So should I use it or just eliminate this option?

Kinesis can handle large amount of data but is it the best way to do it?

I'm not familiar with Glue and Kinesis. But I read that Glue can be slow.

If anyone can point me to the right direction, it will be really helpful.

What is the best way to copy large csv files from s3 to redshift?

Answers (1)

Related Questions