Reputation: 1482
We have ETL jobs i.e. a java jar(performs etl operations) is run via shell script. The shell script is passed with some parameters as per the job being run. These shell scripts are run via crontab as well as manually depending on the requirements. Sometimes there is need of running some sql commands/scripts on posgresql RDS DB too, before the shell script run.
We have everything on AWS i.e. Ec2 talend server, Postgresql RDS, Redshift, ansible etc. How can we automate this process? How to deploy and handle passing custom parameters etc. Pointers are welcome.
Upvotes: 2
Views: 2263
Reputation: 3173
I would prefer to go with AWS Data pipeline, and add steps to perform any pre / post operations on your ETL job, like running shell scripts, or any hql etc.
AWS Glue runs on Spark engine, and it has other features as well as such AWS Glue Development Endpoint, Crawler, Catalog, Job schedulers. I think AWS Glue would be ideal if you are starting afresh, or plan to move your ETL to AWS Glue. Please refer here on price comparison.
AWS Pipeline: For details on AWS Pipeline
AWS Glue FAQ:For details on supported languages for AWS Glue
Please note according to AWS Glue FAQ:
Q: What programming language can I use to write my ETL code for AWS Glue?
You can use either Scala or Python.
Edit: As Jon scott commented, Apache Airflow is another option for job scheduling, but I have not used it.
Upvotes: 3
Reputation: 451
You can use Aws Glue for performing serverless ETL. Glue also has triggers which lets you automate their jobs.
Upvotes: 0