Louis D.
Louis D.

Reputation: 329

How to wait until a job is done or a file is updated in airflow

I am trying to use apache-airflow, with google cloud-composer, to shedule batch processing that result in the training of a model with google ai platform. I failed to use airflow operators as I explain in this question unable to specify master_type in MLEngineTrainingOperator

Using the command line I managed to launch a job successfully. So now my issue is to integrate this command in airflow.

Using BashOperator I can train the model but I need to wait for the job to be completed before creating a version and setting it as the default. This DAG create a version before the job is done

    bash_command_train = "gcloud ai-platform jobs submit training training_job_name " \
                         "--packages=gs://path/to/the/package.tar.gz " \
                         "--python-version=3.5 --region=europe-west1 --runtime-version=1.14" \
                         " --module-name=trainer.train --scale-tier=CUSTOM --master-machine-type=n1-highmem-16"
    bash_train_operator = BashOperator(task_id='train_with_bash_command',
                                       bash_command=bash_command_train,
                                       dag=dag,)



    ...
    create_version_op = MLEngineVersionOperator(
        task_id='create_version',
        project_id=PROJECT,
        model_name=MODEL_NAME,
        version={
            'name': version_name,
            'deploymentUri': export_uri,
            'runtimeVersion': RUNTIME_VERSION,
            'pythonVersion': '3.5',
            'framework': 'SCIKIT_LEARN',
        },
        operation='create')

    set_version_default_op = MLEngineVersionOperator(
        task_id='set_version_as_default',
        project_id=PROJECT,
        model_name=MODEL_NAME,
        version={'name': version_name},
        operation='set_default')

    # Ordering the tasks
    bash_train_operator >> create_version_op >> set_version_default_op

The training result in updating of a file in Gcloud storage. So I am looking for an operator or a sensor that will wait until this file is updated, I noticed GoogleCloudStorageObjectUpdatedSensor, but I dont know how to make it retry until this file is updated. An other solution would be to check for the job to be completed, but I can't find how too.

Any help would be greatly appreciated.

Upvotes: 2

Views: 1993

Answers (1)

Mason McGough
Mason McGough

Reputation: 544

The Google Cloud documentation for the --stream-logs flag:

"Block until job completion and stream the logs while the job runs."

Add this flag to bash_command_train and I think it should solve your problem. The command should only release once the job is finished, then Airflow will mark it as success. It will also let you monitor your training job's logs in Airflow.

Upvotes: 2

Related Questions