Using AWS Data pipeline - EMR vs EC2

Question

I would like to use AWS Data Pipeline to execute an ETL process. Suppose that my process has a small input file and I am would like to use a custom jar or python script to make data transformations. I dont see any reason to use a cluster EMR to make this simple data step. So, I would like to execute this data step in a EC2 single instance.

Looking at the AWS DataPipeline at EMRActivity object, i just see the option to run using an EMR cluster. Is there way to run a computation step inside a EC2 instance? Is it th best solution for this use case? Or Is it better to setup a small EMR (with a single node) and execute a hadoop job?

Using AWS Data pipeline - EMR vs EC2

Answers (1)

Related Questions