Adrian Torrie
Adrian Torrie

Reputation: 2844

ModelAssetPathNotFoundInStorage error when mlflow.sklearn.autolog() used to train within an Azure ML YAML Pipeline

Exception raised in Azure ML GUI

UserErrorException:
    Message: Model asset creation API failed with {'additional_properties': {'message': 'The request is invalid.', 'details': [{'code': 'ModelAssetPathNotFoundInStorage', 'message': 'No blobs found in storage at model asset path: azureml/HD_9b8798ab-c0cb-4c5d-8822-1411c01af249_0/model/'}], 'code': 'BadRequest', 'statusCode': 400}, 'error': <data_capability._restclient.model.models._models_py3.RootError object at 0x7f210407e310>, 'correlation': {'operation': '214b5eca2adce7f52fdba06fb5003437', 'request': '9515d04215048b81', 'RequestId': '9515d04215048b81'}, 'environment': '<REDACTED>', 'location': '<REDACTED>', 'time': datetime.datetime(2023, 1, 12, 23, 1, 27, 38782, tzinfo=<FixedOffset '+00:00'>), 'component_name': 'modelregistry'}
    InnerException None
    ErrorResponse 
{
    "error": {
        "code": "UserError",
        "message": "Model asset creation API failed with {'additional_properties': {'message': 'The request is invalid.', 'details': [{'code': 'ModelAssetPathNotFoundInStorage', 'message': 'No blobs found in storage at model asset path: azureml/HD_9b8798ab-c0cb-4c5d-8822-1411c01af249_0/model/'}], 'code': 'BadRequest', 'statusCode': 400}, 'error': <data_capability._restclient.model.models._models_py3.RootError object at 0x7f210407e310>, 'correlation': {'operation': '214b5eca2adce7f52fdba06fb5003437', 'request': '9515d04215048b81', 'RequestId': '9515d04215048b81'}, 'environment': '<REDACTED>', 'location': '<REDACTED>', 'time': datetime.datetime(2023, 1, 12, 23, 1, 27, 38782, tzinfo=<FixedOffset '+00:00'>), 'component_name': 'modelregistry'}"
    }
} Marking the experiment as failed because initial child jobs have failed due to user error

CLI Command

$ az ml job create --subscription <REDACTED> --resource-group <REDACTED> --workspace-name <REDACTED> --file /home/azureuser/cloudfiles/code/Users/<REDACTED>/repos/<REDACTED>/src/assets/pipeline_tune.yml --stream

RunId: quirky_bone_tf250gdlfg
Web View: https://ml.azure.com/runs/<REDACTED>?wsid=/subscriptions/<REDACTED>/resourcegroups/<REDACTED>/workspaces/<REDACTED>

Streaming logs/azureml/executionlogs.txt
========================================

[2023-01-12 22:58:42Z] Submitting 1 runs, first five are: <REDACTED>
[2023-01-12 23:03:46Z] Execution of experiment failed, update experiment status and cancel running nodes.

Execution Summary
=================
RunId: <REDACTED>
Web View: https://ml.azure.com/runs/<REDACTED>?wsid=/subscriptions/<REDACTED>/resourcegroups/<REDACTED>/workspaces/<REDACTED>
Exception : 
 {
    "error": {
        "code": "UserError",
        "message": "Pipeline has some failed steps. See child run or execution logs for more details.",
        "message_format": "Pipeline has some failed steps. {0}",
        "message_parameters": {},
        "reference_code": "PipelineHasStepJobFailed",
        "details": []
    },
    "environment": "<REDACTED>",
    "location": "<REDACTED>",
    "time": "2023-01-12T23:03:45.982134Z",
    "component_name": ""
} 

Folder structure

src/
  assets/
    component_train.yml
    pipeline_tune.yml
  train.py

src/assets/pipeline_tune.yml

# References
# ----------
# - How to create component pipelines
#   - https://learn.microsoft.com/en-us/azure/machine-learning/how-to-create-component-pipelines-cli
# - Pattern reference
#   - https://github.com/Azure/azureml-examples/tree/main/cli/jobs/pipelines-with-components/pipeline_with_hyperparameter_sweep
# - Pipeline schema
#   - https://learn.microsoft.com/en-us/azure/machine-learning/reference-yaml-job-pipeline
# - Sweep Job schema (hyperparameter tuning)
#   - https://learn.microsoft.com/en-us/azure/machine-learning/reference-yaml-job-sweep
# - Core Azure ML YAML syntax
#   - https://learn.microsoft.com/en-us/azure/machine-learning/reference-yaml-core-syntax#binding-inputs-and-outputs-between-steps-in-a-pipeline-job
$schema: https://azuremlschemas.azureedge.net/latest/pipelineJob.schema.json
type: pipeline

# -------------------------------------------------------------------
# Pipeline settings
# - Having inputs defined at the pipelione level, instead of the first
#  job, allows for parameterisation of the pipeline via both CLI/SDK
experiment_name: <REDACTED>
description: Tune hyperparemeters for training a scikit-learn SVM on the Iris dataset.
settings:
    default_compute: azureml:aml-compute-cpu
    default_datastore: azureml:workspaceblobstore
inputs:
  data: 
    type: uri_file
    mode: ro_mount
    path: wasbs://[email protected]/iris.csv
outputs:
  predict:
    type: uri_folder
    mode: rw_mount
    path: azureml://datastores/workspaceblobstore/paths/<REDACTED>

# -------------------------------------------------------------------
# Jobs
jobs:
  
  # Tune job
  tune:
    type: sweep
    inputs:
      data: ${{parent.inputs.data}}
    outputs:
      model:
        type: mlflow_model
      test_data:
        type: uri_folder

    trial: ./component_train.yml

    search_space:
      c_value:
        type: uniform
        min_value: 0.5
        max_value: 0.9
      kernel:
        type: choice
        values:
          - rbf
          - linear
          - poly
      coef0:
        type: uniform
        min_value: 0.1
        max_value: 1

    sampling_algorithm: random

    objective:
      goal: minimize
      primary_metric: training_f1_score

    limits:
      max_total_trials: 20
      max_concurrent_trials: 10
      timeout: 7200

  # Score test data
  predict:
    type: command
    inputs:
      model: ${{parent.jobs.tune.outputs.model}}
      test_data: ${{parent.jobs.tune.outputs.test_data}}
    outputs:
      predictions: ${{parent.outputs.predict}}
    component: ./component_predict.yml

src/assets/component_train.yml

# References
# ----------
# - How to create component pipelines
#   - https://learn.microsoft.com/en-us/azure/machine-learning/how-to-create-component-pipelines-cli
# - Command schema
#   - https://learn.microsoft.com/en-us/azure/machine-learning/reference-yaml-job-command
# - Core Azure ML YAML syntax
#   - https://learn.microsoft.com/en-us/azure/machine-learning/reference-yaml-core-syntax#binding-inputs-and-outputs-between-steps-in-a-pipeline-job
$schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json
type: command

# -------------------------------------------------------------------
# Command settings
version: 1
description: Training a scikit-learn SVM on the Iris dataset
environment: azureml:<REDACTED>:1
inputs: 
  data:
    type: uri_file
  c_value:
    type: number
    default: 1.0
  kernel:
    type: string
    default: rbf
  coef0: 
    type: number
    default: 0
outputs:
  model:
    type: mlflow_model
  test_data:
    type: uri_folder


# -------------------------------------------------------------------
# Job
code: ..
command: >-
  python train.py
  --data ${{inputs.data}}
  --C ${{inputs.c_value}}
  --kernel ${{inputs.kernel}}
  --coef0 ${{inputs.coef0}}
  --outputs_model ${{outputs.model}}
  --outputs_test_data ${{outputs.test_data}}

src/train.py

"""
Notes
-----
- Imports in this file must match the imports in `score.py` to allow
  pickle objects to be loaded correctly by `score.py`

References
----------
- Azure ML Environments and ScriptRunConfig for training
    - i.e How to execute this script against Azure ML compute cluster
    - https://docs.microsoft.com/en-us/azure/machine-learning/how-to-use-environments#use-environments-for-training
- Azure ML - Hyperparameter tuning
    - https://learn.microsoft.com/en-us/azure/machine-learning/how-to-tune-hyperparameters
- Azure ML - Hyperparameter tuning in Azure Machine Learning pipeline
    - https://learn.microsoft.com/en-us/azure/machine-learning/how-to-use-sweep-in-pipeline#how-to-do-hyperparameter-tuning-in-azure-machine-learning-pipeline
- Azure ML - Random, Grid, Bayesian sampling
    - https://learn.microsoft.com/en-us/azure/machine-learning/how-to-tune-hyperparameters#sampling-the-hyperparameter-space
- Azure ML - How to Train scikit-learn
    - https://learn.microsoft.com/en-us/azure/machine-learning/how-to-train-scikit-learn#prepare-the-training-script
- Azure ML - How to train Tesnorflow
    - https://learn.microsoft.com/en-us/azure/machine-learning/how-to-train-tensorflow
- Azure ML - How to train Keras
    - https://learn.microsoft.com/en-us/azure/machine-learning/how-to-train-keras
- Azure ML - How to train PyTorch
    - https://learn.microsoft.com/en-us/azure/machine-learning/how-to-train-pytorch
- Azure ML - Logging with MLFlow
    - https://learn.microsoft.com/en-us/azure/machine-learning/how-to-log-view-metrics?tabs=jobs#getting-started
- MLFlow - Autologging of frameworks
    - https://mlflow.org/docs/latest/tracking.html#automatic-logging
"""
import argparse

from distutils.dir_util import copy_tree
from pathlib import Path

import mlflow.sklearn
import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC


def get_data(data):
    """Get data and return train/test splits"""
    df = pd.read_csv(data)
    X = df.iloc[:, :-1]
    y = df.iloc[:, -1]
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=0
    )

    # Return split data
    return X_train, X_test, y_train, y_test

def get_hyperparameters(**kwargs):
    """Set hyperparameters here
    
    References
    ----------
    - Understand how to pass hyperparameters to a sklearn pipeline
        - https://stackoverflow.com/questions/66388056/why-does-sklearn-pipeline-set-params-not-work
    """
    hyperparameters = {
        "estimator__C": kwargs.get("c_value"),
        "estimator__kernel": kwargs.get("kernel"),
        "estimator__coef0": kwargs.get("coef0"),
    }
    return hyperparameters

def save_outputs(model, model_dir, X_test, y_test, test_data_dir):
    """Save outputs of the training process"""
    # Save model
    local_dir = "model"
    mlflow.sklearn.save_model(model, local_dir)
    copy_tree(local_dir, model_dir)

    # Save test data
    X_test.to_csv(Path(test_data_dir) / "X_test.csv", index=False)
    y_test.to_csv(Path(test_data_dir) / "y_test.csv", index=False)

def train_model(hyperparameters, X_train, y_train):
    """Train the model with your chosen framework here"""
    # Model architecture
    model = Pipeline(
        steps=[
            ("scaler", StandardScaler()),
            ("estimator", SVC()),
        ]
    )

    # Set hyperparameters for training run
    model.set_params(**hyperparameters)

    # Train model
    model.fit(X_train, y_train)

    return model

def parse_args():
    """Parse args and hyperparameters"""
    parser = argparse.ArgumentParser()

    # Parse mandatory args
    parser.add_argument("data", help="Path to data for training", type=str)
    parser.add_argument("outputs_model", help="Name of the model", type=str)
    parser.add_argument("outputs_test_data", help="Path to data for testing", type=str)

    # Parse hyperparameter args
    parser.add_argument("c_value", help="Coeffiecient for the estimator", type=str)
    parser.add_argument("kernel", help="Kernel for the estimator", type=str)
    parser.add_argument("coef0", help="Coeffiecient for the estimator", type=str)

    # Get args
    args = parser.parse_args()

    return args


def main(**kwargs):
    """Train the model(s)

    Parameters
    ----------
    kwargs : dict
        Dictionary of all parsed arguments
    """
    # Logging
    # - Autologging works with (0.22.1 <= scikit-learn <= 1.1.3)
    # - See: https://www.mlflow.org/docs/latest/python_api/mlflow.sklearn.html#mlflow.sklearn.autolog
    mlflow.sklearn.autolog()

    # Setup
    X_train, X_test, y_train, y_test = get_data(kwargs.get("data"))
    hyperparameters = get_hyperparameters(**kwargs)

    # Train
    model = train_model(hyperparameters, X_train, y_train)

    # Output
    save_outputs(model, kwargs.get("outputs_model"), X_test, y_test, kwargs.get("outputs_test_data"))
    

if __name__ == "__main__":
    """Entrypoint for training the model(s)"""
    main(**vars(parse_args()))

Upvotes: 0

Views: 669

Answers (0)

Related Questions