Reputation: 11
I'm trying to decide which to use, AWS Glue or Fargate, would like to know any advice / previous experience.
So our use case is to move data from S3 to RDS with some schedule (likely every hour). The number of files would be around 200-400k hourly, and each would be small (likely 5-20KB each), which is up to 8GB total. In burst case though, we could have up to 1 million messages in an hour, which leads to 20GB. The file in S3 would be Json format, and We would like to do some transformation and batch write to RDS. There will be multiple buckets in S3 and each will write to a different RDS table. We have some existing java library for the message transformation and sql statement generation, which we would like to reuse if possible.
So currently I'm considering 2 paths: Glue or Fargate.
With Glue, we can use Scala script and depend on our java library for transformation. This way, we could leverage (1) the Glue trigger to schedule the job, (2) Spark for distributed process and (3) Glue job bookmark to auto detect what data needs to be processed. The downside though, is it might be a bit more expensive.
With Fargate, it's definitely still doable, but we will need to do all the above 3 ourselves, but there's more flexibility and less expensive.
Overall, if anyone have similar experience, what do you all think about between these 2 choices regarding development and maintenance efforts needed.
Upvotes: 0
Views: 1431
Reputation: 2018
Have you done a deep cost analysis? Glue can get a bit pricey and Fargate is fairly low cost. The highest cost here could be the listBucket
request which both Glue and Fargate would likely use though ( unless the names are predictable? )
What is the source of the S3 files? Can they be piped to RDS using Firehose? Split out a backup to S3 to also maintain depositing in S3 if you like. There's always a few ways to do this type of work. Another way would be triggering a java lambda from the files landing in S3 so it just continuously copies over.
Upvotes: 0