Scott H
Scott H

Reputation: 2692

How do you get the run parameters and runId within Databricks notebook?

When running a Databricks notebook as a job, you can specify job or run parameters that can be used within the code of the notebook. However, it wasn't clear from documentation how you actually fetch them. I'd like to be able to get all the parameters as well as job id and run id.

Upvotes: 9

Views: 24252

Answers (2)

Cloudkollektiv
Cloudkollektiv

Reputation: 14669

Nowadays you can easily get the parameters from a job through the widget API. This is pretty well described in the official documentation from Databricks. Below, I'll elaborate on the steps you have to take to get there, it is fairly easy.

  1. Create or use an existing notebook that has to accept some parameters. We want to know the job_id and run_id, and let's also add two user-defined parameters environment and animal.

     # Get parameters from job
     job_id = dbutils.widgets.get("job_id")
     run_id = dbutils.widgets.get("run_id")
     environment = dbutils.widgets.get("environment")
     animal = dbutils.widgets.get("animal")
    
     print(job_id)
     print(run_id)    
     print(environment)
     print(animal)
    
  2. Now let's go to Workflows > Jobs to create a parameterised job. Make sure you select the correct notebook and specify the parameters for the job at the bottom. According to the documentation, we need to use curly brackets for the parameter values of job_id and run_id. For the other parameters, we can pick a value ourselves.

Note: The reason why you are not allowed to get the job_id and run_id directly from the notebook, is because of security reasons (as you can see from the stack trace when you try to access the attributes of the context). Within a notebook you are in a different context, those parameters live at a "higher" context.

  1. Run the job and observe that it outputs something like:

     dev
     squirrel
     137355915119346
     7492
     Command took 0.09 seconds 
    
  2. You can even set default parameters in the notebook itself, that will be used if you run the notebook or if the notebook is triggered from a job without parameters. This makes testing easier, and allows you to default certain values.

     # Adding widgets to a notebook
     dbutils.widgets.text("environment", "tst")
     dbutils.widgets.text("animal", "turtle")
    
     # Removing widgets from a notebook
     dbutils.widgets.remove("environment")
     dbutils.widgets.remove("animal")
    
     # Or removing all widgets from a notebook
     dbutils.widgets.removeAll()
    
  1. And last but not least, I tested this on different cluster types, so far I found no limitations. My current settings are:

     spark.databricks.cluster.profile serverless
     spark.databricks.passthrough.enabled true
     spark.databricks.pyspark.enableProcessIsolation true
     spark.databricks.repl.allowedLanguages python,sql
    

Upvotes: 7

Scott H
Scott H

Reputation: 2692

Job/run parameters

When the notebook is run as a job, then any job parameters can be fetched as a dictionary using the dbutils package that Databricks automatically provides and imports. Here's the code:

run_parameters = dbutils.notebook.entry_point.getCurrentBindings()

If the job parameters were {"foo": "bar"}, then the result of the code above gives you the dict {'foo': 'bar'}. Note that Databricks only allows job parameter mappings of str to str, so keys and values will always be strings.

Note that if the notebook is run interactively (not as a job), then the dict will be empty. The getCurrentBinding() method also appears to work for getting any active widget values for the notebook (when run interactively).

Getting the jobId and runId

To get the jobId and runId you can get a context json from dbutils that contains that information. (Adapted from databricks forum):

import json
context_str = dbutils.notebook.entry_point.getDbutils().notebook().getContext().toJson()
context = json.loads(context_str)
run_id_obj = context.get('currentRunId', {})
run_id = run_id_obj.get('id', None) if run_id_obj else None
job_id = context.get('tags', {}).get('jobId', None)

So within the context object, the path of keys for runId is currentRunId > id and the path of keys to jobId is tags > jobId.

Upvotes: 18

Related Questions