Reputation: 57
I am a beginner in Azure Databricks and I want to use APIs to create cluster and submit job in python. I am stuck as I am unable to do so. Also if I have an existing cluster how will the code look like? I got job id after running this code but unable to see any output.
import requests
DOMAIN = ''
TOKEN = ''
response = requests.post(
'https://%s/api/2.0/jobs/create' % (DOMAIN),
headers={'Authorization': 'Bearer %s' % TOKEN},
json={
"name": "SparkPi spark-submit job",
"new_cluster": {
"spark_version": "7.3.x-scala2.12",
"node_type_id": "Standard_DS3_v2",
"num_workers": 2
},
"spark_submit_task": {
"parameters": [
"--class",
"org.apache.spark.examples.SparkPi",
"dbfs:/FileStore/sparkpi_assembly_0_1.jar",
"10"
]
}
}
)
if response.status_code == 200:
print(response.json())
else:
print("Error launching cluster: %s: %s" % (response.json()["error_code"], response.json()["message"]))
Upvotes: 4
Views: 4323
Reputation: 87069
Jobs at Databricks could be executed two ways (see docs):
new_cluster
block, and add the existing_cluster_id
field with the ID of existing cluster. If you don't have a cluster yet, then you can create it via Cluster APIWhen you create a job, then you get back the job ID that could be used to edit the job or delete it. You can also launch the job using the Run Now API. But if you just want to execute the job without create the Job in the UI, then you need to look onto Run Submit API. Either of the APIs will return the ID of specific job run, and then you can use Run Get API to get status of the job, or Run Get Output API to get the results of execution.
Upvotes: 4