Himanshu Gautam
Himanshu Gautam

Reputation: 399

Pre-existing Compute Resource necessary for running a scheduled Azure ML pipeline?

I have been exploring Azure ML Pipeline. I am referring to this notebook for the below code:

Here is a small snippet from a MS Repo:

train_step = PythonScriptStep(name = "Prepare Data",
source_directory = experiment_folder,
script_name = "prep_diabetes.py",
arguments = ['--input-data', diabetes_ds.as_named_input('raw_data'),
'--prepped-data', prepped_data_folder],
outputs=[prepped_data_folder],
compute_target = pipeline_cluster,
runconfig = pipeline_run_config,
allow_reuse = True)

This suggests that while defining a pipeline, we must provide it a compute resource(pipeline_cluster). This obviously makes sense, since specific compute might be required for a specific step.

But do we need to have this compute resource up and running always, so that whenever a pipeline is triggered, it can find the compute resource?

Also, i figured we can probably keep a cluster with Zero minimum nodes, in which cases cluster is resized whenever pipeline is triggered. But i think there is a minimal cost incurrent in probably container registry regularly in such a setup. Is this the recommended way to deploy ML pipelines or some more efficient approach is possible?

Upvotes: 2

Views: 69

Answers (1)

Anders Swanson
Anders Swanson

Reputation: 3961

Yep you're right -- create a ComputeTarget with a minimum of zero nodes. The container registry costs are ~$0.16 USD/day and, IIRC, that cost is bundled in with Azure Machine learning.

This is what our team does for our published pipelines in production.

Upvotes: 1

Related Questions