Data_Insight
Data_Insight

Reputation: 585

How to trigger azure Databricks notebook from Apache Airflow

I have created some ETL in Azure data bricks notebook. Now trying to execute that notebook from the airflow-1.10.10.

If anyone can help it would be great.

Thanks In Advance.

Upvotes: 3

Views: 1394

Answers (1)

Alex Ott
Alex Ott

Reputation: 87369

Airflow includes native integration with Databricks, that provides 2 operators: DatabricksRunNowOperator & DatabricksSubmitRunOperator (package name is different depending on the version of Airflow. There is also an example of how it could be used.

You will need to create a connection with name databricks_default with login parameters that will be used to schedule your job. In simplest case, for job you just need to provide a definition of the cluster, and notebook specification (at least path to notebook to run), something like this:

    notebook_task_params = {
        'new_cluster': new_cluster,
        'notebook_task': {
            'notebook_path': '/Users/[email protected]/PrepareData',
        },
    }
    # Example of using the JSON parameter to initialize the operator.
    notebook_task = DatabricksSubmitRunOperator(
        task_id='notebook_task',
        json=notebook_task_params
    )

P.S. There is an old blog post with announcement of this integration.

Upvotes: 1

Related Questions