Reputation: 1286
I'm using Python (as Python wheel application) on Databricks.
I deploy & run my jobs using dbx.
I defined some Databricks Workflow using Python wheel tasks.
Everything is working fine, but I'm having issue to extract "databricks_job_id" & "databricks_run_id" for logging/monitoring purpose.
I'm used to defined {{job_id}} & {{run_id}} as parameter in "Notebook Task" or other task type, ( see this How do you get the run parameters and runId within Databricks notebook?) but with Python wheel I'm not able to define theses :
With Python wheel task, parameters are basically an array of string :
["/dbfs/Shared/dbx/projects/myproject/66655665aac24e748d4e7b28c6f4d624/artifacts/myparameter.yml","/dbfs/Shared/dbx/projects/myproject/66655665aac24e748d4e7b28c6f4d624/artifacts/conf"]
Adding "{{job_id}}" & "{{run_id}}" in this array doesn't seems to work ...
Do you have any ideas ? Don't want to use any REST API during my workload just to extract theses ids...
Upvotes: 0
Views: 790
Reputation: 2729
Follow below Approach .You will get the Run Id and job id .
Approach 1:
code:
from pyspark.sql.types import IntegerType
from pyspark.sql.types import *
from pyspark.sql import Row
import base64
import requests
import json
url_list = "https://adb-1090xxxxxxx1.12.azuredatabricks.net/api/2.0/jobs/runs/list"
headers = {
'Authorization': 'Bearer dapia23481xxxxxxx78e7f',
'Content-Type': 'application/json'
}
response = requests.get(url_list, headers=headers).json()
for job_run in response["runs"]:
job_id = job_run["job_id"]
run_id = job_run["run_id"]
print(f"Job ID: {job_id}, Run ID: {run_id}")
Approach 2:
Using Log analytics: If you have configured diagnostic logs in azure databricks , you can use KQL queries to get the JobID and RunID :
DatabricksJobs
| where TimeGenerated > ago(48h)
| limit 10
For information refer this SO thread by CHEEKATLAPRADEEP.
Approach 3:
First create pass the parameter and define the job or task and Fetch and print the values.
print(f"""
job_id: {dbutils.widgets.get('job_id')}
run_id: {dbutils.widgets.get('run_id')}
""")
For more info refer this blog by Jitesh Soni
Upvotes: 1