kanimbla
kanimbla

Reputation: 890

Databricks multi-task jobs - pass MLflow run_id from one task to next task

I would like to create a databricks multi-task with following sequence:

Is it possible to pass run_id from task 1 to task 2 and if so is there any documentation on how this could be done?

Upvotes: 0

Views: 665

Answers (3)

Kiana Hadd
Kiana Hadd

Reputation: 11

You need three notebooks, one parent notebook and two child notebooks, to perform this task.

In the first child notebook, train model with results logged to MLflow tracking server. Then, include the following code at the end.

# Get Current Run ID
run    = mlflow.active_run()
run_id = run.info.run_id
mlflow.end_run()

# Pass Notebook Parameters
exit_struct = {"runId":str(run_id),"status":"Success"}
dbutils.notebook.exit(exit_struct)

Then, retrieve the run_id in the parent notebook and pass it to the second child notebook using the following code:

returnvalue = literal_eval(
dbutils.notebook.run('CHILD_NOTEBOOK_ONE', 0, {SET_OF_PARAMS_TO_RUN_CHILD_NOTEBOOK_ONE})

Now, you can run the second child notebook.

  literal_eval(
dbutils.notebook.run('SECOND_CHILD_NOTEBOOK', 0, {
  'RUN_ID' : returnvalue.get("runId")})

Finally, in the second child notebook, you should get the passed run_id and initiate an MLflow client using that.

RUN_ID        = dbutils.widgets.get("RUN_ID")
mlflow.start_run(run_id=RUN_ID)

Upvotes: 0

Alex Ott
Alex Ott

Reputation: 87144

As of right now (it may change), it's impossible to pass results between jobs if you use multi-task job.

But you can call another notebook as a child job if you use notebook workflows and function dbutils.notebooks.run:

# notebook 1
... training code ...
dbutils.notebooks.run("notebook2", 300, {"run_id": run_id})

Upvotes: 2

You can consider the following steps ,

Get the run id from note_book1 by following code and parsing the required run_id.

import requests
import json
class BearerAuth(requests.auth.AuthBase):
    def __init__(self, token):
        self.token = token
    def __call__(self, r):
        r.headers["authorization"] = "Bearer " + self.token
        return r
response = requests.get('https://instancename/api/2.0/jobs/list', auth=BearerAuth('enteryourtoken')).json()
response

We can pass that variable from task1 to task2.

Upvotes: 0

Related Questions