Swapnil
Swapnil

Reputation: 125

AWS-Batch vs EC2 vs AWS Workspaces for running batch scripts to load data to Redshift

I have multiple CSV files containing data for different tables, with different file sizes varying from 1 MB to 1.5 GB. I want to process the data (replace/remove values of columns) row by row and then load the data to existing Redshift tables. This is once a day batch processing.

  1. AWS Lambda:
    • Lambda has limitations of memory, hence I was not able to run process for large CSV files.
  2. EC2: I already have EC2 instance where I am running python script to transform and load the data to redshift.
    • I have keep EC2 running all the time, which has all python scripts which I want to run for all tables and environment created (installing python, psycopg lib etc), leads to more cost.
  3. AWS Batch:
    • I created a container image which has all the setup to run the python scripts, and pushed it to ECR.
    • I then set up AWS Batch job, which can take this container image and run it through ECS.
    • This is more optimized, I only pay for EC2 used and ECR image storage.
    • But all the development and unit testing I will have to do on my personal desktop and then push a container, no inline AWS service to test.
  4. AWS Workspaces:
    • I am not much familiar with AWS Workspaces, but need inputs, can this also be used as aws batch to start and stop instance when required and run python scripts on that, edit or test scripts.
    • Also, Can I schedule it to run everyday at defined time?

I need a inputs on which service is best suitable, optimized solution for such use-case? Or It would also be great if anyone suggests a better way to use above services I mentioned in better way.

Upvotes: 0

Views: 1662

Answers (1)

Wale
Wale

Reputation: 379

Batch is best suited for your use case. I see that your concern about batch is about the development and unit testing on your personal desktop. You can automate that process using AWS ECR, CodePipeline, CodeCommit and CodeBuild. Setup a pipeline to detect changes made to your code repo, build the image and push it to ECR. Batch can pick up the latest image from there.

Upvotes: 0

Related Questions