Airflow tasks execution on multiple s3 keys followed by next task execution

Question

I have a use case where we have a set of 10 files(let's say) in a s3 directory. We are trying to rename those files to their corresponding mapped renamed file name pattern in a second directory. I have created the task id uniques dynamically by passing the filename.

 for file in rename_list:
    rename_tak = RenameOperator(
        task_id="file_rename_task_{}".format(str(file.split(":")[0])),
        s3_conn_id=s3_CONN_ID,
        source_s3_bucket=source_bucket,
        destination_s3_bucket=destination_bucket,
        s3_key = source_prefix + source_key,
        rename_key = destination_key,
        output_prefix = output_prefix,
        dag=dag))

Then I again need to perform another operation on it and finally move it to the final s3 directory. That will also run in the for loop as we have multiple files.

Now, the issue is the second operator/task execution is not getting called, no error but the Airflow logs says "Task is in the 'removed' state which is not a valid state for execution. The task must be cleared in order to be run." The first tasks of rename operators are all successful but the second operators are simply getting removed, no output, no logs.

Any feedback what might be going wrong here.

Airflow tasks execution on multiple s3 keys followed by next task execution

Answers (1)

Related Questions