Reputation: 338
Trying to figure out how to export data from HDFS which is outputted by Apache Spark Streaming job. Following diagram defines solution architecture:
Apache Spark runs streaming job in AWS EMR cluster and stores result in HDFS. Streaming job collects data once every hour by using window functions and performs computations. I need to export these results to S3 and RDS which I can do easily by running S3Distcp and Sqoop commands however I want these to be run exactly once each computation is complete. I would like to do this more gracefully using something else than cron job.
Any ideas?
Thank you
Upvotes: 0
Views: 250
Reputation: 2094
You can post a message to a SQS queue and do your job in a lambda.
Upvotes: 1