L Xandor
L Xandor

Reputation: 1841

Pipeline for parsing daily json data with AWS?

json files are posted daily to an s3 bucket. I want to take that json file, do some processing on it, then post the data to a new s3 bucket where it will get picked up and stored in Redshift. What would be the recommended AWS pipeline for this? AWS lambda that triggers when a new json file is placed on s3, that then kicks off something like an AWS batch job? Or something else? I am not familiar with all AWS web services so might be overlooking something obvious.

So the flow looks like this:

s3 bucket -> data processing -> s3 bucket -> redshift

and it's the data processing step I'm not sure about - how to schedule something fairly scalable that runs daily and efficiently and puts the data back. The processing is parsing of json data and some aggregation and data clean up.

Upvotes: 0

Views: 204

Answers (1)

Jeremy Thompson
Jeremy Thompson

Reputation: 65594

and it's the data processing step I'm not sure about - how to schedule something fairly scalable that runs daily and efficiently and puts the data back.

Don't worry about scalability with Lambda, just focus on short running jobs. Here is an example: https://docs.aws.amazon.com/lambda/latest/dg/with-scheduledevents-example.html

I think one piece of the puzzle you're missing is the documentation for Schedule Expressions Using Rate or Cron: https://docs.aws.amazon.com/lambda/latest/dg/with-scheduledevents-example.html

Upvotes: 2

Related Questions