Reputation: 1577
Full workflow:
Looking at DAG run history in Composer/Airflow UI, where there's a task failure and then immediately following by task success.
The purpose of the task is to upload a file to BQ. The path to file is provided by the Cloud Function.
There is a clear pattern where the logs of the failed task show that the task tried to process a file with pattern like my_timestamped_file_name.csv.part
The following task that succeeds show in the logs that the file it processed had the same pattern without the .part
: my_timestamped_file_name.csv
It seems to me that the Cloud Function (CF) is being triggered by the partially uploaded file created by SFTP mirror instead of waiting for the file to be done uploading. Of course, when the file is completely uploaded, the .part
file disappears and the task fails because it has nothing to process.
My Cloud Function's Event Type is defined as Finalize/Create. Is there a way to avoid partially uploaded files? Other than using a hacky conditional statement inside the CF to avoid files that end with .part
?
Upvotes: 1
Views: 642
Reputation: 10069
The rule we are creating is when ever a file is created it should trigger the GCF, so it is doing the job correctly. Possible solutions are
Upvotes: 1