azureazure-data-factoryazure-machine-learning-service

Reputation: 834

Unable to parametrize ML pipeline endpoint name - Azure Data Factory

Sorry for long post, I need to explain it properly for people to undertsand.

I have a pipeline in datafctory that triggers a published AML endpoint:

I am trying to parametrize this ADF pipeline so that I can deploy to test and prod, but on test and prod the aml endpoints are different.

Therefore, I have tried to edit the parameter configuration in ADF as shows here:

Here in the section Microsoft.DataFactory/factories/pipelines I add "*":"=" so that all the pipeline parameters are parametrized:

 "Microsoft.DataFactory/factories/pipelines": {
        "*": "="
    }

After this I export the template to see which parameters are there in json, there are lot of them but I do not see any paramter that has aml endpoint name as value, but I see the endpint ID is parametrized.

My question is: Is it possible to parametrize the AML endpoint by name? So that, when deploying ADF to test I can just provide the AML endpoint name and it can pick the id automatically:

Upvotes: 1

Answers (3)

DEEPAK TOMAR

Reputation: 34

Making changes to ADF(ARMTemplateForFactory.json) or Synapse(TemplateForWorkspace.json) inside DevOps CI/CD pipeline

Sometimes parameters are not automatically added to parameter file i.e ARMTemplateParametersForFactory.json/TemplateParametersForWorkspace.json, for example MLPipelineEndpointId. In case of ML pipeline you can use PipelineId as parameter ,but can change every time ML pipeline is updated.

You can solve this issue by replacing the value in ADF(ARMTemplateForFactory.json) or Synapse(TemplateForWorkspace.json), using Azure Powershell. Idea is simple, you use powershell to open the ArmTemplate and replace the value based upon the env and it works exactly like overwriting parameters within DevOps.

This editing is done on the fly i.e the devOps artifact is updated and not the repo file, the ADF/Synapse repository won't change..just like how it's done while over writting parameters.

Issue We currently have two environments for Synapse called bla-bla-dev and bla-bla-test. Now dev synapse environment is using dev machine learning environment and test synapse environment is using test ML environment. But the MLPipelineEndpointId is grayed out on dev synapse and the parameter is not present in parameter file so it can't be overwritten normally.

enter image description here

Solution Use Azure Powershell to run below command:-

(Get-Content $(System.DefaultWorkingDirectory)/Artifacts_source/bla-bla-dev/TemplateForWorkspace.json).Replace($(scoringMLPipelineEndPointDev), $(scoringMLPipelineEndPoint)) | Set-Content $(System.DefaultWorkingDirectory)/Artifacts_source/bla-bla-dev/TemplateForWorkspace.json

$(System.DefaultWorkingDirectory) = This points to release pipelines artifacts which are based on the armtemplate repository.
$(scoringMLPipelineEndPointDev) = The value you would like to be replace.
$(scoringMLPipelineEndPoint) = The value that will be replacing dev parameter value

Steps

Create devOps pipeline variable one for dev environment (One to be replaced) and then another one for test environment (Test MLPipelineEndpointId for test synapse pipeline).

enter image description here

Add Azure Powershell step in ADF/Synapse release devOps pipeline. This CI/CD has to be placed before arm template deployment step.

(Get-Content $(System.DefaultWorkingDirectory)/Artifacts_source/bla-bla-dev/TemplateForWorkspace.json).Replace($(scoringMLPipelineEndPointDev), $(scoringMLPipelineEndPoint)) | Set-Content $(System.DefaultWorkingDirectory)/Artifacts_source/bla-bla-dev/TemplateForWorkspace.json

enter image description here Once deployment you will see that you test environment is pointing to test MLpipelineEndpoinId.

Upvotes: 0

tfkLSTM

Reputation: 181

I finally fixed this.

The trick is to not chose Pipeline Endpoint ID but to choose Pipeline ID.

Pipeline ID can be parametrized and I have set up this to come from a global parameter. Therefore I do not need to find the right level of identation everytime

Then:

Later you add the global parameters to your ARM template:

And in the parameter template you add:

"Microsoft.DataFactory/factories": {
        "properties": {
            "globalParameters": {
                "*": {
                    "value": "="
                }
            },
            "globalConfigurations": {
                "*": "="
            },
            "encryption": {
                "*": "=",
                "identity": {
                    "*": "="
                }
            }
        }
"Microsoft.DataFactory/factories/globalparameters": {
    "properties": {
        "*": {
            "value": "="
        }
    }
}

Finally I wrote a python CLI tool to get the latest pipeline ID for a given published pipeline id:

import argparse
from azureml.pipeline.core import PipelineEndpoint, PublishedPipeline, Pipeline
from azureml.core import Workspace
from env_variables import Env
from manage_workspace import get_workspace


def get_latest_published_endpoint(ws : Workspace, pipeline_name : str) -> str:
    """
    Get the latest published endpoint given a machine learning pipeline name.
    The function is used to update the pipeline id in ADF deploy pipeline

    Parameters
    ------
    ws : azureml.core.Workspace
        A workspace object to use to search for the models
    pipeline_name : str
        A string containing the pipeline name to retrieve the latest version

    Returns
    -------
    pipeline_name : azureml.pipeline.core.PipelineEndpoint
        The pipeline name to retrieve the last version
    """
    pipeline_endpoint = PipelineEndpoint.get(workspace=ws, name=pipeline_name)
    endpoint_id = pipeline_endpoint.get_pipeline().id # this gives back the pipeline id
    # pipeline_endpoint.id gives back the pipeline endpoint id which can not be set
    # as dynamic parameter in ADF in an easy way

    return endpoint_id

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--monitoring_pipeline_name", type=str,
                        help="Pipeline Name to get endpoint id",
                        default='yourmonitoringpipeline')
    parser.add_argument("--training_pipeline_name", type=str,
                        help="Pipeline Name to get endpoint id",
                        default='yourtrainingpipeline')
    parser.add_argument("--scoring_pipeline_name", type=str,
                        help="Pipeline Name to get endpoint id",
                        default='yourscoringpipeline')
    args, _ = parser.parse_known_args()
    e = Env()

    ws = get_workspace(e.workspace_name, e.subscription_id, e.resource_group)  # type: ignore
    latest_monitoring_endpoint = get_latest_published_endpoint(ws, pipeline_name=args.monitoring_pipeline_name)  # type: ignore
    latest_training_endpoint = get_latest_published_endpoint(ws, pipeline_name=args.training_pipeline_name) # type: ignore
    latest_scoring_endpoint = get_latest_published_endpoint(ws, pipeline_name=args.scoring_pipeline_name) # type: ignore
    print('##vso[task.setvariable variable=MONITORING_PIPELINE_ID;]%s' % (latest_monitoring_endpoint))
    print('##vso[task.setvariable variable=TRAINING_PIPELINE_ID;]%s' % (latest_training_endpoint))
    print('##vso[task.setvariable variable=SCORING_PIPELINE_ID;]%s' % (latest_scoring_endpoint))

By printing the variables in these way they are added to environment variables that later I can pick in the ARM deploy step:

And then we have our desired setup:

Different pipeline IDs for different environments.

Maybe material for a blog post as it works like charm.

Upvotes: 1

Parimala Killada

Reputation: 1

i faced the similar issue when deploying adf pipelines with ml between environments. Unfortunately, As of now, adf parameter file do not have ml pipeline name as parameter value. only turn around solution is modifiying the parameter file(json) file with aligns with your pipeline design. For example, i am triggering ml pipeline endpoint inside foreach activity-->if condition-->ml pipeline

Here is my parameter file values:

"Microsoft.DataFactory/factories/pipelines": {
    "properties": {
        "activities": [
            {
                "typeProperties": {
                    "mlPipelineEndpointId": "=",
                    "url": {
                        "value": "="
                    },
                    "ifFalseActivities": [
                        {
                            "typeProperties": {
                                "mlPipelineEndpointId": "="
                            }
                        }
                    ],
                    "ifTrueActivities": [
                        {
                            "typeProperties": {
                                "mlPipelineEndpointId": "="
                            }
                        }
                    ],
                    "activities": [
                        {
                            "typeProperties": {
                                "mlPipelineEndpointId": "=",
                                "ifFalseActivities": [
                                    {
                                        "typeProperties": {
                                            "mlPipelineEndpointId": "=",
                                            "url": "="
                                        }
                                    }
                                ],
                                "ifTrueActivities": [
                                    {
                                        "typeProperties": {
                                            "mlPipelineEndpointId": "=",
                                            "url": "="
                                        }
                                    }
                                ]
                            }
                        }
                    ]
                }
            }
        ]
    }
}

after you export the ARM template, the json file has records for your ml endpoints

"ADFPIPELINE_NAME_properties_1_typeProperties_1_typeProperties_0_typeProperties_mlPipelineEndpointId": {
        "value": "445xxxxx-xxxx-xxxxx-xxxxx"

it is lot of manual effort to maintain if design is frequently changing so far worked for me. Hope this answers your question.

Upvotes: 0

Unable to parametrize ML pipeline endpoint name - Azure Data Factory

Answers (3)

Related Questions