Reputation: 1
I have 3 python scripts (a.py, b.py, c.py) that needed to run in sequence in ML studio Pipelines. I'm trying to replicate an example in the code below. However this code do NOT run the pipeline jobs in sequence.
I do need to run the process in Azure ML Pipelines (and not as a single job of three scripts) and I am aware that I could create dummy inputs and outputs for each command, in order to set the hierarchy of the jobs I want to run.
However, I do find that a bit complicated, and I would like to ask if anyone knows another way to set the hierarchy.
For example, if there is a way to configure to the job "b" b_job.run_after(a_job), but I can not find anything that works. It should be easy to set the hierarchy of jobs without needed to create dummy inputs and outputs.
Thank you for your help,
Marios
import warnings
warnings.filterwarnings("ignore")
import yaml
from azure.ai.ml import command, dsl
from azure.ai.ml.entities import PipelineJobSettings
import os
import sys
# Append the directory to system path
sys.path.append(os.path.join(os.path.dirname(__file__), ".."))
# Load environment variables
if __name__ == "__main__":
from dotenv import load_dotenv, find_dotenv
load_dotenv(find_dotenv())
# Load configuration from YAML file
with open(os.path.join("conda.yaml"), encoding="utf-8") as stream:
config = yaml.safe_load(stream)
# Initialize MLStudioHandler
from ml_studio_jobs.mlstudio_handling import MLStudioHandler
ml_studio_handler = MLStudioHandler()
env = ml_studio_handler.get_env_version(config['name'])
compute = ml_studio_handler.get_or_start_compute(os.environ.get("COMPUTE_NAME"))
mode = "a"
a_job = command(
code="/.", # location of source code
command="python a.py",
environment=env,
display_name=f"test_{mode}",
)
a_component = ml_studio_handler.create_or_update(a_job.component)
mode = "b"
b_job = command(
code="/.", # location of source code
command="python b.py",
environment=env,
display_name=f"test_{mode}",
)
b_component = ml_studio_handler.create_or_update(b_job.component)
mode = "c"
c_job = command(
code="/.", # location of source code
command="python c.py",
environment=env,
display_name=f"test_{mode}",
)
c_component = ml_studio_handler.create_or_update(c_job.component)
# Define the pipeline
@dsl.pipeline(
compute=compute.name,
description="E2E Churn Prediction Pipeline",
)
def process():
# Step 1: Get Data - produces dummy output
a_job = a_component()
b_job = b_component()
c_job = c_component()
# Create the pipeline
pipeline = process()
pipeline.settings = PipelineJobSettings(force_rerun=True, continue_on_step_failure=True)
# Submit the pipeline job
pipeline_job = ml_studio_handler.create_or_update_jobs(
jobs=pipeline,
experiment_name="test_experiment_ml",
)
Upvotes: 0
Views: 136
Reputation: 7985
Unless some dependencies between the components you cannot run them in sequence.
So, here the dependencies are inputs and outputs.
What you trying to find is not possible in azure ml pipelines without inputs and outputs.
Basically, each component in azure ml designer pipeline consists of input and output node where you need to configure and used to run them in sequence one after another.
If you find difficulty in configuring inputs and outputs here is sample version you can use to build your command component using mldesigner
Also, check this on how to manage inputs and outputs in components.
Upvotes: 0