Laurynas Stašys
Laurynas Stašys

Reputation: 338

Scheduling output exporting from HDFS to S3

Trying to figure out how to export data from HDFS which is outputted by Apache Spark Streaming job. Following diagram defines solution architecture:

Solution architecture

Apache Spark runs streaming job in AWS EMR cluster and stores result in HDFS. Streaming job collects data once every hour by using window functions and performs computations. I need to export these results to S3 and RDS which I can do easily by running S3Distcp and Sqoop commands however I want these to be run exactly once each computation is complete. I would like to do this more gracefully using something else than cron job.

Any ideas?

Thank you

Upvotes: 0

Views: 250

Answers (1)

Michel Lemay
Michel Lemay

Reputation: 2094

You can post a message to a SQS queue and do your job in a lambda.

Upvotes: 1

Related Questions