Reputation: 61
I have created a pipeline in Azure Data Factory. I created a Databricks workspace, notebook (with some code), and a cluster. I created the connection from ADF to DB. I tested the connection. All lights are green. I published the ADF pipeline.
When I trigger the job, it says SUCCESS. But nothing happens in Databricks. No job is created in DB. The code in the notebook cell is apparently not executed. (I know this because the code prints the current time.)
Has anyone done this successfully?
To be clear, I want Data Factory to use an existing cluster in Databricks, not create a new one. I have named the cluster in the pipeline setup params.
Upvotes: 1
Views: 6793
Reputation: 61
Solved. The problem was that the notebook (containing my code) was within my User notebook folder. Data Factory did not have permission to see/use my notebook. I created the same notebook within the Shared folder and everything works fine.
I will point out that ADF should issue an error/warning if the named notebook cannot be seen or used. The ADF pipeline verified fine, reported a successful run, but just failed silently.
Upvotes: 1
Reputation: 16401
Please reference this tutorial: Run a Databricks notebook with the Databricks Notebook Activity in Azure Data Factory.
In this tutorial, you use the Azure portal to create an Azure Data Factory pipeline that executes a Databricks notebook against the Databricks jobs cluster. It also passes Azure Data Factory parameters to the Databricks notebook during execution.
You perform the following steps in this tutorial:
One of the difference is you don't need to create new job cluster
, select use an existing cluster
.
Hope this helps.
Upvotes: 1