Valentin Richer
Valentin Richer

Reputation: 770

Template_searchpath gives TemplateNotFound error in Airflow and cannot find the SQL script

I have a DAG described like this :

tmpl_search_path = '/home/airflow/gcs/sql_requests/'

with DAG(dag_id='pipeline', default_args=default_args, template_searchpath = [tmpl_search_path]) as dag:

    create_table = bigquery_operator.BigQueryOperator(
        task_id = 'create_table',
        sql = 'create_table.sql',
        use_legacy_sql = False,
        destination_dataset_table = some_table)
    )

The task create_table calls a SQL script create_table.sql. This SQL script is not in the same folder as the DAG folder : it is in a sql_requests folder at the same level as the DAG folder. This is the architecture inside the bucket of the GCP Composer (which is the Google Airflow) is :

bucket_name
|- airflow.cfg
|- dags
   |_ pipeline.py
|- ...
|_ sql_requests
   |_ create_table.sql

What path do I need to set for template_searchpath to reference the folder sql_requests inside the Airflow bucket on GCP ?

I have tried template_searchpath= ['/home/airflow/gcs/sql_requests'], template_searchpath= ['../sql_requests'], template_searchpath= ['/sql_requests'] but none of these have worked.

The error message I get is 'jinja2.exceptions.TemplateNotFound'

Upvotes: 4

Views: 6448

Answers (4)

Marcelo Vinicius
Marcelo Vinicius

Reputation: 871

Change 'sql_requests' folder into the 'dag' folder so that your code will be like this:

tmpl_search_path = '/home/airflow/dags/sql_requests/'  
with DAG(dag_id='pipeline', default_args=default_args, template_searchpath = [tmpl_search_path]) as dag:      
    create_table = bigquery_operator.BigQueryOperator(
        task_id = 'create_table',
        sql = 'create_table.sql',
        use_legacy_sql = False,
        destination_dataset_table = some_table
    )
)

For me, it works!

Upvotes: 2

the pillow
the pillow

Reputation: 412

You can store in mounted/known paths which are dags/plugins OR data

data folder has no capacity limits but it's easy to throw yourself off using it to store anything that the web server would need to read, because the web server can't access that folder (e.g if you put SQL files in /data folder, you would not be able to parse rendered template in the UI, but any tasks that need to access the file during run time would just run fine)

Upvotes: 2

Valentin Richer
Valentin Richer

Reputation: 770

According to https://cloud.google.com/composer/docs/concepts/cloud-storage it is not possible to store files that are needed to execute dags elsewhere than in the folders dags or plugins :

To avoid a workflow failure, store your DAGs, plugins, and Python modules in the dags/ or plugins/ folders—even if your Python modules do not contain DAGs or plugins.

This is the reason why I had the TemplateNotFound error.

Upvotes: 3

Adam Hopkinson
Adam Hopkinson

Reputation: 28795

I believe by default the operator looks for sql files in the DAG folder, so you could put your SQL into the folder

gs://composer-bucket-name/dags/sql/create_table.sql

And then reference it as

sql = '/sql/create_table.sql'

If that doesn't work, try it without the leading / (which I'm not sure you need)

Edit

If you want to put them in a folder at the root of the bucket, try

sql = '../sql/create_table.sql'

Upvotes: 0

Related Questions