Gohmz
Gohmz

Reputation: 1286

Databricks Python wheel based on Databricks Workflow. Acces job_id & run_id

I'm using Python (as Python wheel application) on Databricks.

I deploy & run my jobs using dbx.

I defined some Databricks Workflow using Python wheel tasks.

Everything is working fine, but I'm having issue to extract "databricks_job_id" & "databricks_run_id" for logging/monitoring purpose.

I'm used to defined {{job_id}} & {{run_id}} as parameter in "Notebook Task" or other task type, ( see this How do you get the run parameters and runId within Databricks notebook?) but with Python wheel I'm not able to define theses :

With Python wheel task, parameters are basically an array of string :

["/dbfs/Shared/dbx/projects/myproject/66655665aac24e748d4e7b28c6f4d624/artifacts/myparameter.yml","/dbfs/Shared/dbx/projects/myproject/66655665aac24e748d4e7b28c6f4d624/artifacts/conf"]

Adding "{{job_id}}" & "{{run_id}}" in this array doesn't seems to work ...

Do you have any ideas ? Don't want to use any REST API during my workload just to extract theses ids...

Upvotes: 0

Views: 790

Answers (1)

Vamsi Bitra
Vamsi Bitra

Reputation: 2729

Follow below Approach .You will get the Run Id and job id .

Approach 1:

code:

from pyspark.sql.types import IntegerType
from pyspark.sql.types import *
from pyspark.sql import Row
import base64
import requests
import json
url_list = "https://adb-1090xxxxxxx1.12.azuredatabricks.net/api/2.0/jobs/runs/list"
headers = {
  'Authorization': 'Bearer dapia23481xxxxxxx78e7f',
  'Content-Type': 'application/json'
}

response = requests.get(url_list, headers=headers).json()

for job_run in response["runs"]:
    job_id = job_run["job_id"]
    run_id = job_run["run_id"]
    print(f"Job ID: {job_id}, Run ID: {run_id}")

enter image description here

Approach 2:

Using Log analytics: If you have configured diagnostic logs in azure databricks , you can use KQL queries to get the JobID and RunID :

DatabricksJobs
| where TimeGenerated > ago(48h)
| limit 10 

For information refer this SO thread by CHEEKATLAPRADEEP.

Approach 3:

First create pass the parameter and define the job or task and Fetch and print the values.

print(f"""
  job_id: {dbutils.widgets.get('job_id')}
  run_id: {dbutils.widgets.get('run_id')}
  """)

For more info refer this blog by Jitesh Soni

Upvotes: 1

Related Questions