AWS data pipeline VS lambda for EMR automation

Question

Here are the steps for my application in AWS .

Data will be loaded weekly in separate 35 S3 folders .
On completion of data loading in each 35 folders 35 EMR cluster will be created .
Each EMR cluster will have spark-scala script to run parrelly .
On completion of job all cluster will be terminated .

How can i achieve this ?

As far as i have searched there are two options .

Invoking AWS lambda function on S3 event and lambda will create EMR cluster and will do spark-submit .
I read about AWS data pipeline .

Will AWS Data pipeline will be helpful in my scenario ?

Also i have spark-scala script that i have been running zeppelin . If required i can create jar out of that and submit in data pipe line .

Please consider the cost also .I have 5TB of data to be delivered to client weekly .

Answers (1)