Reputation: 399
I have been exploring Azure ML Pipeline. I am referring to this notebook for the below code:
Here is a small snippet from a MS Repo:
train_step = PythonScriptStep(name = "Prepare Data",
source_directory = experiment_folder,
script_name = "prep_diabetes.py",
arguments = ['--input-data', diabetes_ds.as_named_input('raw_data'),
'--prepped-data', prepped_data_folder],
outputs=[prepped_data_folder],
compute_target = pipeline_cluster,
runconfig = pipeline_run_config,
allow_reuse = True)
This suggests that while defining a pipeline, we must provide it a compute resource(pipeline_cluster). This obviously makes sense, since specific compute might be required for a specific step.
But do we need to have this compute resource up and running always, so that whenever a pipeline is triggered, it can find the compute resource?
Also, i figured we can probably keep a cluster with Zero minimum nodes, in which cases cluster is resized whenever pipeline is triggered. But i think there is a minimal cost incurrent in probably container registry regularly in such a setup. Is this the recommended way to deploy ML pipelines or some more efficient approach is possible?
Upvotes: 2
Views: 69
Reputation: 3961
Yep you're right -- create a ComputeTarget
with a minimum of zero nodes. The container registry costs are ~$0.16 USD/day and, IIRC, that cost is bundled in with Azure Machine learning.
This is what our team does for our published pipelines in production.
Upvotes: 1