How to run query in AWS Glue

Question

I have the data I need in AWS Redshift. It is in a database called Lz (Landingzone). That data inserted into a database called Stage (same redshift). I have the (select) queries that transform the data from it's raw state (LZ) into the dimensions and facts that I need. Traditionally I would use these queries combined with an insert to transform the data. The result would be merged into the datawarehouse (again: same redshift)

How do I do this in Glue? Can I execute the queries in a job using python? Or can i create stored procedures in redshift that are executed by/from jobs?

Jon Scott · Accepted Answer

I do not think you can run redshift sql in a straighforward and scheduled way from glue at the moment (unfortunately). This is a major omission from the product in my opinion.

There is an option to use pre/post statements as described in this article. this is pretty basic and may not meet your needs.

https://aws.amazon.com/premiumsupport/knowledge-center/sql-commands-redshift-glue-job/

You can use a "proper" scheduling product (like airflow) or a crontab and a bash script to do the following 1) monitor the status of your glue jobs (the dependent jobs) 2) run sql on redshift

You could also potentially use cloudwatch (to watch for glue completion) and lambda (to run redshift sql) instead, however I do not recommend that approach due to the run time limit of a lambda job (which may be less time than you need for your redshift sql to complete)

How to run query in AWS Glue

Answers (1)

Related Questions