Brendan
Brendan

Reputation: 57

Azure Data Factory V2 exploring pipeline dependencies

I am working with quite a lot of pipelines, and with that involves a lot of dependencies between pipelines.

This isn't ideal for a couple reasons:

Ideally I should be able to "select" a random pipeline and be able to know what pipelines dependencies it has for both before and after execution.

I was thinking about using the Data Factory SDK's to try and build the dependency structure of all my pipelines. But thought I would chuck this out there to see if anyone has discovered any solutions for this, or have any ideas before going down a rabbit hole.

I appreciate any advice.

Cheers, Brendan

Upvotes: 1

Views: 1477

Answers (2)

user1390375
user1390375

Reputation: 746

using Az.Datafactory module (in Powershell...) with a particular pipeline object from Get-AzDataFactoryV2Pipeline

$azDFPipelines = Get-AzDataFactoryV2Pipeline -ResourceGroupName = "$azRG" -DataFactoryName = "$AzADFName"

the "Activities" property can be expanded, as can its DependsOn property:

$azDFPipelines[X] | select ADFName -ExpandProperty Activities | select adfName, name, description -expandproperty DependsOn

ADFName: name of the pipeline Name: name of the activity/object in the pipeline Description: description of the activity/object in the pipeline DependsOn: data from the activity/object's dependencies (e.g., the objects it's "connected" to).

I've got a script that does this, and runs the output thru out-gridview. From there, I can then add different criteria fields to help look for stuff throughout the entire pipeline collection on my "server". Kinda helpful, if not really very user-friendly.

import-module Az.Accounts
import-module Az.DataFactory
$azAcct = Connect-AzAccount -subscription 'your_subscription_name'
#$azRgName, $azDFName are "empirically determined"
enter code here
$azDFPipelines = Get-AzDataFactoryV2Pipeline -ResourceGroupName "$azRGname" -DataFactoryName "$azDFName" 
###need to coerce the Name property to ADFName because it's also a member of the Activities object/property...
$adf = $azDFPipelines | select-object @{N='ADFName';E={$_.name}},`
@{N='Activities';E={$_.Activities}}

Next-level would be making it essentially walk the tree in a given pipeline or from a pipeline's specific pipeline object, and spit out graphviz "dot" or Mermaid graph language .md (or .vsdx...)

###expand Activities, and also select just a few properties from Activities:
$adflist = $adf | select ADFName -ExpandProperty Activities | select ADFName, name, description, notebookpath, additionalproperties
$adfList | out-gridview

Next-level would be making it essentially walk the tree in a given pipeline or from a pipeline's specific pipeline object, and spit out graphviz "dot" or Mermaid graph language .md (or .vsdx...)

###expand Activities, and also select just a few properties from Activities: $adflist = $adf | select ADFName -ExpandProperty Activities | select ADFName, name, description, notebookpath, additionalproperties $adfList | out-gridview

the DependsOn can be extracted from the additionalproperties collection...

the AdditionalProperties has the name of the "next" thing to be run in it, what it is, etc.

Like SSIS, execution flow through a ADF pipeline is "parallel" and are invoked non-deterministically, unless they're connected to each other serially. same goes for connected objects - they're invoked "in parallel" unless they're connected in series.

Upvotes: 0

DhruvJoshi
DhruvJoshi

Reputation: 17146

Brendan, our ADF is connected to git and so when I need to know what will be affected if I change the pipeline with say name somePipelineName, I goto git bash and type out

grep --color=always -4 "somePipelineName" * 

on the pipelines folder

This helps me find all places from where the pipeline may be called.

Update: 2020-09-17

I noticed today that we now have the related pipelines listing

enter image description here

Upvotes: 2

Related Questions