Reputation: 13
I would like to execute the content of Azure Databricks notebook with use of REST Jobs API in the following manner:
For point 1 I use the following (as suggested by the documentation here):
curl -n -X POST -H 'Content-Type: application/json' -d '{"name": "endpoint job", "existing_cluster_id": "xxx", "notebook_task": {"notebook_path": "path"}, "base_parameters": {"input_multiple_polygons": "input_multiple_polygons", "input_date_start": "input_date_start", "input_date_end": "input_date_end" }}' https://yyy.azuredatabricks.net/api/2.0/jobs/runs/submit
To address point 2 I tried the following approaches without success:
2.1. approach 1: input = spark.conf.get("base_parameters", "default")
2.2. approach 2: input = spark.sparkContext.getConf().getAll()
2.3. approach 3:
a = dbutils.widgets.getArgument("input_multiple_polygons", "default")
b = dbutils.widgets.getArgument("input_date_start", "default")
c = dbutils.widgets.getArgument("input_date_end", "default")
input = [a,b,c]
2.4. approach 4 (as per the official documentation here):
a = dbutils.widgets.get("input_multiple_polygons")
b = dbutils.widgets.get("input_date_start")
c = dbutils.widgets.get("input_date_end")
input = [a,b,c]
The REST Jobs endpoints are working fine and the execution is successful, however, none of the outlined four approaches seems to be able to deliver the arguments to the PySpark Context.
I am sure I do something incorrect in either the curl part or the args retrieval part but I can't identify the problem. Can anyone suggest where the issue may be?
Upvotes: 0
Views: 1178
Reputation: 794
Looks like you are not enclosing the base_parameter as an element within notebook_task. Can you try something like below? I assume you are passing right values for base_parameters since the example shared shows parameter values are given same as parameter name.
curl -n -X POST -H 'Content-Type: application/json' -d '{"name": "endpoint job", "existing_cluster_id": "xxx", "notebook_task": {"notebook_path": "path", "base_parameters": {"input_multiple_polygons": "input_multiple_polygons", "input_date_start": "input_date_start", "input_date_end": "input_date_end" }}}' https://yyy.azuredatabricks.net/api/2.0/jobs/runs/submit
Easy way to identify how it looks like is to define a job using UI and use api/2.0/jobs/get?job_id=<jobId>
to see the JSON response.
Upvotes: 1