Azure Machine Learning v2 Pipeline - set jobs hierarchy without inputs/outputs

Question

I have 3 python scripts (a.py, b.py, c.py) that needed to run in sequence in ML studio Pipelines. I'm trying to replicate an example in the code below. However this code do NOT run the pipeline jobs in sequence.

I do need to run the process in Azure ML Pipelines (and not as a single job of three scripts) and I am aware that I could create dummy inputs and outputs for each command, in order to set the hierarchy of the jobs I want to run.

However, I do find that a bit complicated, and I would like to ask if anyone knows another way to set the hierarchy.

For example, if there is a way to configure to the job "b" b_job.run_after(a_job), but I can not find anything that works. It should be easy to set the hierarchy of jobs without needed to create dummy inputs and outputs.

Thank you for your help,

Marios

import warnings
warnings.filterwarnings("ignore")
import yaml
from azure.ai.ml import command, dsl
from azure.ai.ml.entities import PipelineJobSettings
import os
import sys
# Append the directory to system path
sys.path.append(os.path.join(os.path.dirname(__file__), ".."))

# Load environment variables
if __name__ == "__main__":
    from dotenv import load_dotenv, find_dotenv

    load_dotenv(find_dotenv())

    # Load configuration from YAML file
    with open(os.path.join("conda.yaml"), encoding="utf-8") as stream:
        config = yaml.safe_load(stream)

    # Initialize MLStudioHandler
    from ml_studio_jobs.mlstudio_handling import MLStudioHandler

    ml_studio_handler = MLStudioHandler()
    env = ml_studio_handler.get_env_version(config['name'])
    compute = ml_studio_handler.get_or_start_compute(os.environ.get("COMPUTE_NAME"))

    mode = "a"
    a_job = command(
        code="/.",  # location of source code
        command="python a.py",
        environment=env,
        display_name=f"test_{mode}",
    )
    a_component = ml_studio_handler.create_or_update(a_job.component)

    mode = "b"
    b_job = command(
        code="/.",  # location of source code
        command="python b.py",
        environment=env,
        display_name=f"test_{mode}",
    )
    b_component = ml_studio_handler.create_or_update(b_job.component)

    mode = "c"
    c_job = command(
        code="/.",  # location of source code
        command="python c.py",
        environment=env,
        display_name=f"test_{mode}",
    )
    c_component = ml_studio_handler.create_or_update(c_job.component)


    # Define the pipeline
    @dsl.pipeline(
        compute=compute.name,
        description="E2E Churn Prediction Pipeline",
    )
    def process():
        # Step 1: Get Data - produces dummy output
        a_job = a_component()
        b_job = b_component()
        c_job = c_component()


    # Create the pipeline
    pipeline = process()
    pipeline.settings = PipelineJobSettings(force_rerun=True, continue_on_step_failure=True)

    # Submit the pipeline job
    pipeline_job = ml_studio_handler.create_or_update_jobs(
        jobs=pipeline,
        experiment_name="test_experiment_ml",
    )

Azure Machine Learning v2 Pipeline - set jobs hierarchy without inputs/outputs

Answers (1)

Related Questions