Reputation: 990
I am trying to trigger a notebook from Airflow. The notebook has parameters defined as widgets and I am trying to pass values to it through the notebook_params parameter and though it triggers, when I look at the job submitted, parameters do not seem to be passed.
E.g. code
new_cluster = {'spark_version': '6.5.x-cpu-ml-scala2.11',
'node_type_id': 'Standard_DS3_v2',
'num_workers': 4
}
notebook_task = DatabricksSubmitRunOperator(task_id='notebook_task',
json={'new_cluster': new_cluster,
'notebook_task': {
'notebook_path': '/Users/[email protected]/Demo',
'notebook_parameters':'{"fromdate":"20200420","todate":"20200420", "datalakename":"exampledatalake", "dbname": "default", "filesystem":"refined" , "tablename":"ntcsegmentprediction", "modeloutputpath":"curated"}'
},
})
however, DatabricksRunNowOperator supports it, and it works
notebook_run = DatabricksRunNowOperator(task_id='notebook_task',
job_id=24,
notebook_params={"fromdate":"20200420","todate":"20200420", "datalakename":"exampledatalake", "dbname": "default", "filesystem":"refined" , "tablename":"ntcsegmentprediction", "modeloutputpath":"curated"}
)
In the documentation and source code of DatabricksSubmitRunOperator in here
it says it can take in a notebook_task. If it can, not sure why it can't take in parameters
What am I missing?
If more information is required, I can provide that as well.
Upvotes: 6
Views: 5319
Reputation: 71
To use it with the DatabricksSubmitRunOperator you need to add it as an extra argument in the json parameter: ParamPair
notebook_task_params = {
'new_cluster': cluster_def,
'notebook_task': {
'notebook_path': 'path',
'base_parameters':{
"param1": "**",
"param2": "**"}
}
}
notebook_task = DatabricksSubmitRunOperator(
task_id='id***',
dag=dag,
trigger_rule=TriggerRule.ALL_DONE,
json=notebook_task_params)
And then you can just use the dbutils.widgets.get to retrieve the value or set defaults.
param1 = getArgument("param1", "default")
param2 = getArgument("param2", "default")
getArgument > (DEPRECATED) Equivalent to get
Upvotes: 4
Reputation: 56
You should use base_parameters
instead of notebook_params
https://docs.databricks.com/dev-tools/api/latest/jobs.html#jobsnotebooktask
Upvotes: 4