whoopscheckmate
whoopscheckmate

Reputation: 876

How to Trigger ADF Pipeline from Synapse Pipelines

Problem

Due to internal requirements, I need to run a Synapse pipeline and then trigger an ADF pipeline. It does not seem that there is a Microsoft-approved method of doing this. The pipelines run infrequently (every week or month) and the ADF pipeline must run after the Synapse pipeline.

Options

It seems that other answers pose several options:

  1. Azure Functions. Create an Azure function that calls the CreatePipelineRun function on the ADF pipeline. At the end of the Synapse pipeline, insert a block that calls the Azure function.
  2. Use the REST API and Web Activity. Use the REST API to make a call to run the ADF pipeline. Insert a Web Activity block at the end of the Synapse pipeline to make the API call.
  3. Tables and polling. Insert a record into a table in a managed database with data about the Synapse pipeline run. Have regular polling from the ADF pipeline to check for new records and run when ready.
  4. Storage Event. Create a timestamped blob file at the end of the Synapse run. Use the "storage event trigger" within ADF to trigger the ADF pipeline.

Question

Which of these would be closest to the "approved" option? Are there any clear disadvantages to any of these?

Upvotes: 1

Views: 2370

Answers (1)

Joel Cochran
Joel Cochran

Reputation: 7728

As you mentioned, there is no "approved" solution for this problem. All the approaches you mentioned have pros and cons and should work. For me, Option #3 has been very successful. We have built a Queue Manager based on Tables & Stored Procedures in Azure SQL. We use Logic Apps to process the Triggers which can be Scheduled, Blob Events, or REST calls. Those Logic Apps insert jobs in the Queue table via Stored Procedure. That Stored Procedure can be called directly by virtually any system, so your Synapse pipeline could insert a Queue job to execute the ADF pipeline. Other benefits include a log of all the pipeline runs, support for multiple Data Factories (and now Synapse Workspaces), and a web interface we wrapped around the database for management and tracking.

We have 2 other Logic Apps that process the Queue (a Status manager and an Executor). These run constantly (every 1 minute and every 3 minutes). The actions to check status and create pipeline runs are both implemented as .NET Azure Functions [you'll need different SDKs for Synapse vs. ADF]. This system runs thousands of pipelines a month, sometimes more, across numerous Data Factories and Synapse Workspaces.

The PROs here are many, but this disconnected approach permits facets of your system to operate in isolation. And it is flexible, in that you can tie virtually any system into the Queue. Your example of a pipeline that needs to execute another pipeline in a different system is a perfect example.

The CON here is that this is the most involved approach. If this is a on-off problem you are trying to solve, choose one of the other options.

Upvotes: 1

Related Questions