ExploringApple
ExploringApple

Reputation: 1482

How to automate ETL job deployment and run?

We have ETL jobs i.e. a java jar(performs etl operations) is run via shell script. The shell script is passed with some parameters as per the job being run. These shell scripts are run via crontab as well as manually depending on the requirements. Sometimes there is need of running some sql commands/scripts on posgresql RDS DB too, before the shell script run.

We have everything on AWS i.e. Ec2 talend server, Postgresql RDS, Redshift, ansible etc. How can we automate this process? How to deploy and handle passing custom parameters etc. Pointers are welcome.

Upvotes: 2

Views: 2263

Answers (2)

Yuva
Yuva

Reputation: 3173

I would prefer to go with AWS Data pipeline, and add steps to perform any pre / post operations on your ETL job, like running shell scripts, or any hql etc.

AWS Glue runs on Spark engine, and it has other features as well as such AWS Glue Development Endpoint, Crawler, Catalog, Job schedulers. I think AWS Glue would be ideal if you are starting afresh, or plan to move your ETL to AWS Glue. Please refer here on price comparison.

AWS Pipeline: For details on AWS Pipeline

AWS Glue FAQ:For details on supported languages for AWS Glue

Please note according to AWS Glue FAQ:

Q: What programming language can I use to write my ETL code for AWS Glue?

You can use either Scala or Python.

Edit: As Jon scott commented, Apache Airflow is another option for job scheduling, but I have not used it.

Upvotes: 3

Kishore Bharathy
Kishore Bharathy

Reputation: 451

You can use Aws Glue for performing serverless ETL. Glue also has triggers which lets you automate their jobs.

Upvotes: 0

Related Questions