Reputation: 586
The accepted answer to this question states that
"...the gs://my-bucket/dags folder is available in the scheduler, web server, and workers at /home/airflow/gcs/dags."
(which is supported by the newer docs)
So I wrote a bash operator like this:
t1 = bash.BashOperator(
task_id='my_test',
bash_command="touch /home/airflow/gcs/data/test.txt",
)
I thought by prefacing my file creation with the path specified in the answer it would write to the data folder in my cloud composer environment's associated storage account. Simiarly, touch test.txt
also ran successfully but didn't actually create a file anywhere I can see it (I assume it's written to the worker's temp storage which is then deleted when the worker is shut down following execution of the DAG). I can't seem to persist any data from simple commands run through a DAG? Is it even possible to simply write out some files from a bash script running in Cloud Composer? Thank you in advance.
Upvotes: 0
Views: 1556
Reputation: 586
Bizarrely, I needed to add a space at the end of the string containing the Bash command.
t1 = bash.BashOperator(
task_id='my_test',
bash_command="touch /home/airflow/gcs/data/test.txt ",
)
The frustrating thing was the error said the path didn't exist so I went down a rabbit-hole mapping the directories of the Airflow worker until I was absolutely certain it did - then I found a similar issue here. Although I didn't get the 'Jinja Template not Found Error' I should have got according to this note.
Upvotes: 2