Reputation: 381
I have an sklearn k-means model. I am training the model and saving it in a pickle file so I can deploy it later using azure ml library. The model that I am training uses a custom Feature Encoder called MultiColumnLabelEncoder. The pipeline model is defined as follow :
# Pipeline
kmeans = KMeans(n_clusters=3, random_state=0)
pipe = Pipeline([
("encoder", MultiColumnLabelEncoder()),
('k-means', kmeans),
])
#Training the pipeline
model = pipe.fit(visitors_df)
prediction = model.predict(visitors_df)
#save the model in pickle/joblib format
filename = 'k_means_model.pkl'
joblib.dump(model, filename)
The model saving works fine. The Deployment steps are the same as the steps in this link :
However the deployment always fails with this error :
File "/var/azureml-server/create_app.py", line 3, in <module>
from app import main
File "/var/azureml-server/app.py", line 27, in <module>
import main as user_main
File "/var/azureml-app/main.py", line 19, in <module>
driver_module_spec.loader.exec_module(driver_module)
File "/structure/azureml-app/score.py", line 22, in <module>
importlib.import_module("multilabelencoder")
File "/azureml-envs/azureml_b707e8c15a41fd316cf6c660941cf3d5/lib/python3.6/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
ModuleNotFoundError: No module named 'multilabelencoder'
I understand that pickle/joblib has some problems unpickling the custom function MultiLabelEncoder. That's why I defined this class in a separate python script (which I executed also). I called this custom function in the training python script, in the deployment script and in the scoring python file (score.py). The importing in the score.py file is not successful. So my question is how can I import custom python module to azure ml deployment environment ?
Thank you in advance.
EDIT: This is my .yml file
name: project_environment
dependencies:
# The python interpreter version.
# Currently Azure ML only supports 3.5.2 and later.
- python=3.6.2
- pip:
- multilabelencoder==1.0.4
- scikit-learn
- azureml-defaults==1.0.74.*
- pandas
channels:
- conda-forge
Upvotes: 6
Views: 4704
Reputation: 26676
Try
# Additional imports
from azureml.core.environment import CondaDependencies
# Modify your YAML file to include the private package
yaml_content = """
name: model-env
channels:
- conda-forge
dependencies:
- python=3.10.11
- numpy
- pip
- scikit-learn
- scipy
- pandas
- pip:
- azureml-defaults
- tempfile2
- xlrd
- mlflow
- azureml-mlflow
"""
# Create CondaDependencies object from YAML
conda_dep = CondaDependencies(conda_dependencies_file_content=yaml_content)
# Register the private wheel file
private_wheel_path = "path_to_your_private_wheel_file.whl"
experiment_env.add_private_pip_wheel(private_wheel_path)
# Register the environment
experiment_env.register(workspace=workspace)
# Fetch the registered environment
registered_env = Environment.get(workspace, 'experiment_env')
# Create a new runconfig object for the pipeline
pipeline_run_config = RunConfiguration()
# Use the config
pipeline_run_config.target = pipeline_cluster
# Assign env to compute
pipeline_run_config.environment = registered_env
Upvotes: 0
Reputation: 11
An alternative method that works for me is to register a "model_src"-directory containing both the pickled model and a custom module, instead of registering only the pickled model. Then, specify the custom module in the scoring script during deployment, e.g., using python's os module. Example below using sdk-v1:
Example of "model_src"-directory
model_src
│
├─ utils # your custom module
│ └─ multilabelencoder.py
│
└─ models
├─ score.py
└─ k_means_model_45.pkl # your pickled model file
Register "model_src" in sdk-v1
model = Model.register(model_path="./model_src",
model_name="kmeans",
description="model registered as a directory",
workspace=ws
)
Correspondingly, when defining the inference config
deployment_folder = './model_src'
script_file = 'models/score.py'
service_env = Environment.from_conda_specification("kmeans-service",
'./environment.yml' # wherever yml is located locally
)
inference_config = InferenceConfig(source_directory=deployment_folder,
entry_script=script_file,
environment=service_env
)
Content of scoring script, e.g., score.py
# Specify model_src as your parent
import os
deploy_dir = os.path.join(os.getenv('AZUREML_MODEL_DIR'),'model_src')
# Import custom module
import sys
sys.path.append("{0}/utils".format(deploy_dir))
from multilabelencoder import MultiColumnLabelEncoder
import joblib
def init():
global model
# Call the custom encoder to be used dfor unpickling the model
encoder = MultiColumnLabelEncoder() # Use as intended downstream
# Get the path where the deployed model can be found.
model = joblib.load('{}/models/k_means_model_45.pkl'.format(deploy_dir))
This method provides flexibility in importing various custom scripts in my scoring script.
Upvotes: 1
Reputation: 111
I am facing the same problem, trying to deploy a model that has dependency on some of my own scripts and got the error message:
ModuleNotFoundError: No module named 'my-own-module-name'
Found this "Private wheel files" solution in MS documentation and it works. The difference from the solution above is now I do not need to publish my scripts to pip. I think many people might face the same situation that for some reason you cannot or do not want to publish your scripts. Instead, your own wheel file is saved under your own blob storage.
Following the documentation, I did the following steps and it worked for me. Now I can deploy my model that has dependency in my own scripts.
Package your own scripts that the model is dependent on into wheel file, and the wheel file is saved locally.
"your_path/your-wheel-file-name.whl"
Follow the instructions in the "Private wheel files" solution in MS documentation. Below is the code that worked for me.
from azureml.core.environment import Environment
from azureml.core.conda_dependencies import CondaDependencies
whl_url = Environment.add_private_pip_wheel(workspace=ws,file_path = "your_pathpath/your-wheel-file-name.whl")
myenv = CondaDependencies()
myenv.add_pip_package("scikit-learn==0.22.1")
myenv.add_pip_package("azureml-defaults")
myenv.add_pip_package(whl_url)
with open("myenv.yml","w") as f:
f.write(myenv.serialize_to_string())
My environment file now looks like:
name: project_environment
dependencies:
# The python interpreter version.
# Currently Azure ML only supports 3.5.2 and later.
- python=3.6.2
- pip:
- scikit-learn==0.22.1
- azureml-defaults
- https://myworkspaceid.blob.core/azureml/Environment/azureml-private-packages/my-wheel-file-name.whl
channels:
- conda-forge
I'm new to Azure ml. Learning by doing and communicating with the community. This solution works fine for me, hope that it helps.
Upvotes: 5
Reputation: 381
In fact, the solution was to import my customized class MultiColumnLabelEncoder as a pip package (You can find it through pip install multilllabelencoder==1.0.5). Then I passed the pip package to the .yml file or in the InferenceConfig of the azure ml environment. In the score.py file, I imported the class as follows :
from multilabelencoder import multilabelencoder
def init():
global model
# Call the custom encoder to be used dfor unpickling the model
encoder = multilabelencoder.MultiColumnLabelEncoder()
# Get the path where the deployed model can be found.
model_path = os.path.join(os.getenv('AZUREML_MODEL_DIR'), 'k_means_model_45.pkl')
model = joblib.load(model_path)
Then the deployment was successful. One more important thing is I had to use the same pip package (multilabelencoder) in the training pipeline as here :
from multilabelencoder import multilabelencoder
pipe = Pipeline([
("encoder", multilabelencoder.MultiColumnLabelEncoder(columns)),
('k-means', kmeans),
])
#Training the pipeline
trainedModel = pipe.fit(df)
Upvotes: 4