Saugat Mukherjee
Saugat Mukherjee

Reputation: 990

AirFlow DatabricksSubmitRunOperator does not take in notebook parameters

I am trying to trigger a notebook from Airflow. The notebook has parameters defined as widgets and I am trying to pass values to it through the notebook_params parameter and though it triggers, when I look at the job submitted, parameters do not seem to be passed.

E.g. code

new_cluster = {'spark_version': '6.5.x-cpu-ml-scala2.11',
                        'node_type_id': 'Standard_DS3_v2',
                        'num_workers': 4
                        }

notebook_task = DatabricksSubmitRunOperator(task_id='notebook_task',
             json={'new_cluster': new_cluster,
                                'notebook_task': {
                                    'notebook_path': '/Users/[email protected]/Demo',
                                    'notebook_parameters':'{"fromdate":"20200420","todate":"20200420", "datalakename":"exampledatalake", "dbname": "default", "filesystem":"refined" , "tablename":"ntcsegmentprediction", "modeloutputpath":"curated"}'
                                },
                            })

however, DatabricksRunNowOperator supports it, and it works

notebook_run = DatabricksRunNowOperator(task_id='notebook_task',
            job_id=24,
            notebook_params={"fromdate":"20200420","todate":"20200420", "datalakename":"exampledatalake", "dbname": "default", "filesystem":"refined" , "tablename":"ntcsegmentprediction", "modeloutputpath":"curated"}
        )

In the documentation and source code of DatabricksSubmitRunOperator in here

it says it can take in a notebook_task. If it can, not sure why it can't take in parameters

What am I missing?

If more information is required, I can provide that as well.

Upvotes: 6

Views: 5319

Answers (2)

rsanchezavalos
rsanchezavalos

Reputation: 71

To use it with the DatabricksSubmitRunOperator you need to add it as an extra argument in the json parameter: ParamPair

notebook_task_params = {
    'new_cluster': cluster_def,
    'notebook_task': {
        'notebook_path': 'path',
        'base_parameters':{
            "param1": "**",
            "param2": "**"}            
    }   
}
notebook_task = DatabricksSubmitRunOperator(
    task_id='id***',
    dag=dag,
    trigger_rule=TriggerRule.ALL_DONE,
    json=notebook_task_params)

And then you can just use the dbutils.widgets.get to retrieve the value or set defaults.

param1 = getArgument("param1", "default")
param2 = getArgument("param2", "default")

getArgument > (DEPRECATED) Equivalent to get

Upvotes: 4

Gustavo lee
Gustavo lee

Reputation: 56

You should use base_parameters instead of notebook_params

https://docs.databricks.com/dev-tools/api/latest/jobs.html#jobsnotebooktask

Upvotes: 4

Related Questions