zzzz
zzzz

Reputation: 43

How can I start using MLFlow on databricks with an existing trained model?

I have an existing model that was trained on Azure. I want to fully integrate and start using the model on Databricks. Whats the best way to do this? How can I successfully load the model into databricks model workflow? I have the model in a pickle file

I have read almost all the documentation on databricks, but 99% of it is regarding new models trained on databricks and never about importing existing models.

Upvotes: 3

Views: 3856

Answers (2)

Daniel Schneider
Daniel Schneider

Reputation: 2046

Since MLFlow has a standardized model storage format, you just need to bring over the model files and start using them with the MLFlow package. In addition, you can register the model to the workspace's model registry using mlflow.register_model() and then use it from there. These would be the steps:

  1. On the AzureML side, I assume that you have an MLFlow model saved to disk (using mlflow.sklearn.save_model() or mlflow.sklearn.autolog -- or some other mlflow.<flavor>). That should give you a folder that contains an MLModel file, and, depending on the flavor of the model a few more files -- like the below:
mlflow-model
├── MLmodel
├── conda.yaml
├── model.pkl
└── requirements.txt

Note: You can download the model from the AzureML Workspace using the v2 CLI like so: az ml model download --name <model_name> --version <model_version>

  1. Open a Databricks Notebook and make sure it has mlflow installed
%pip install mlflow
  1. Upload the MLFlow model files to the dbfs connected to the cluster enter image description here

  2. In the Notebook, register the model using MLFlow (adjust the dbfs: path to the location where the model was uploaded to).

import mlflow

model_version = mlflow.register_model("dbfs:/FileStore/shared_uploads/mlflow-model/", "AzureMLModel")

Now your model is registered in the Workspace's model registry like any model that was created from a Databricks session. So, you can access it from the registry like so:

model = mlflow.pyfunc.load_model(f"models:/AzureMLModel/{model_version.version}")

input_example = {
   "sepal_length": [5.1,4.8],
   "sepal_width": [3.5,4.4],
   "petal_length": [1.4,2.0],
   "petal_width": [0.2,0.1]
 }
model.predict(input_example)

enter image description here

Or use the model as a spark_udf:

import pandas as pd
model_udf = mlflow.pyfunc.spark_udf(spark=spark, model_uri=f"models:/AzureMLModel/{model_version.version}", result_type='string' )
spark_df = spark.createDataFrame(pd.DataFrame(input_example))
spark_df = spark_df.withColumn('foo', model_udf())
display(spark_df)

enter image description here

Note that I am using mlflow.pyfunc to load the model since every MLFlow model needs to support the pyfunc flavor. That way, you don't need to worry about the native flavor of the model.

Upvotes: 3

Andre
Andre

Reputation: 354

  1. If your source model is already in a MLflow tracking server.

https://github.com/mlflow/mlflow-export-import

  1. If your source model was not trained in MLflow.

How do I create an MLflow run from a model I have trained elsewhere?

https://github.com/amesar/mlflow-resources/blob/master/MLflow_FAQ.md#how-do-i-create-an-mlflow-run-from-a-model-i-have-trained-elsewhere

Upvotes: 4

Related Questions