Reputation: 770
I have a DAG described like this :
tmpl_search_path = '/home/airflow/gcs/sql_requests/'
with DAG(dag_id='pipeline', default_args=default_args, template_searchpath = [tmpl_search_path]) as dag:
create_table = bigquery_operator.BigQueryOperator(
task_id = 'create_table',
sql = 'create_table.sql',
use_legacy_sql = False,
destination_dataset_table = some_table)
)
The task create_table
calls a SQL script create_table.sql
. This SQL script is not in the same folder as the DAG folder : it is in a sql_requests
folder at the same level as the DAG folder.
This is the architecture inside the bucket of the GCP Composer (which is the Google Airflow) is :
bucket_name
|- airflow.cfg
|- dags
|_ pipeline.py
|- ...
|_ sql_requests
|_ create_table.sql
What path do I need to set for template_searchpath to reference the folder sql_requests
inside the Airflow bucket on GCP ?
I have tried template_searchpath= ['/home/airflow/gcs/sql_requests']
, template_searchpath= ['../sql_requests']
, template_searchpath= ['/sql_requests']
but none of these have worked.
The error message I get is 'jinja2.exceptions.TemplateNotFound'
Upvotes: 4
Views: 6448
Reputation: 871
Change 'sql_requests' folder into the 'dag' folder so that your code will be like this:
tmpl_search_path = '/home/airflow/dags/sql_requests/'
with DAG(dag_id='pipeline', default_args=default_args, template_searchpath = [tmpl_search_path]) as dag:
create_table = bigquery_operator.BigQueryOperator(
task_id = 'create_table',
sql = 'create_table.sql',
use_legacy_sql = False,
destination_dataset_table = some_table
)
)
For me, it works!
Upvotes: 2
Reputation: 412
You can store in mounted/known paths which are dags/plugins OR data
data folder has no capacity limits but it's easy to throw yourself off using it to store anything that the web server would need to read, because the web server can't access that folder (e.g if you put SQL files in /data folder, you would not be able to parse rendered template in the UI, but any tasks that need to access the file during run time would just run fine)
Upvotes: 2
Reputation: 770
According to https://cloud.google.com/composer/docs/concepts/cloud-storage it is not possible to store files that are needed to execute dags elsewhere than in the folders dags
or plugins
:
To avoid a workflow failure, store your DAGs, plugins, and Python modules in the dags/ or plugins/ folders—even if your Python modules do not contain DAGs or plugins.
This is the reason why I had the TemplateNotFound error.
Upvotes: 3
Reputation: 28795
I believe by default the operator looks for sql files in the DAG folder, so you could put your SQL into the folder
gs://composer-bucket-name/dags/sql/create_table.sql
And then reference it as
sql = '/sql/create_table.sql'
If that doesn't work, try it without the leading /
(which I'm not sure you need)
Edit
If you want to put them in a folder at the root of the bucket, try
sql = '../sql/create_table.sql'
Upvotes: 0