Reputation: 55
My colleagues and I are facing an issue when trying to run my Databricks notebook in Azure Data Factory. The error is coming from MLFlow.
The command that is failing is the following:
# Take the parent notebook path to use as path for the experiment
context = json.loads(dbutils.notebook.entry_point.getDbutils().notebook().getContext().toJson())
nb_base_path = context['extraContext']['notebook_path'][:-len("00_training_and_validation")]
experiment_path = nb_base_path + 'trainings'
mlflow.set_experiment(experiment_path)
experiment = mlflow.get_experiment_by_name(experiment_path)
experiment_id = experiment.experiment_id
run = mlflow.start_run(experiment_id=experiment_id, run_name=f"run_{datetime.now().strftime('%Y-%m-%d_%H-%M-%S')}")
And the error that is throwing is:
An exception was thrown from a UDF: 'mlflow.exceptions.RestException: INVALID_PARAMETER_VALUE: No experiment ID was specified. An experiment ID must be specified in Databricks Jobs and when logging to the MLflow server from outside the Databricks workspace. If using the Python fluent API, you can set an active experiment under which to create runs by calling mlflow.set_experiment("/path/to/experiment/in/workspace") at the start of your program.', from , line 32.
The pipeline just runs the notebook from ADF, it does not have any other step and the cluster we are using is type 7.3 ML.
Could you please help us?
Thank you in advance!
Upvotes: 1
Views: 1526
Reputation: 43
I think you need to set artifact URI and specify experiment ID (if in the artifact directory has much experiment ID
Reference: https://www.mlflow.org/docs/latest/tracking.html#how-runs-and-artifacts-are-recorded
Upvotes: 0