Reputation: 57
I am working with quite a lot of pipelines, and with that involves a lot of dependencies between pipelines.
This isn't ideal for a couple reasons:
Ideally I should be able to "select" a random pipeline and be able to know what pipelines dependencies it has for both before and after execution.
I was thinking about using the Data Factory SDK's to try and build the dependency structure of all my pipelines. But thought I would chuck this out there to see if anyone has discovered any solutions for this, or have any ideas before going down a rabbit hole.
I appreciate any advice.
Cheers, Brendan
Upvotes: 1
Views: 1477
Reputation: 746
using Az.Datafactory module (in Powershell...) with a particular pipeline object from Get-AzDataFactoryV2Pipeline
$azDFPipelines = Get-AzDataFactoryV2Pipeline -ResourceGroupName = "$azRG" -DataFactoryName = "$AzADFName"
the "Activities" property can be expanded, as can its DependsOn property:
$azDFPipelines[X] | select ADFName -ExpandProperty Activities | select adfName, name, description -expandproperty DependsOn
ADFName: name of the pipeline Name: name of the activity/object in the pipeline Description: description of the activity/object in the pipeline DependsOn: data from the activity/object's dependencies (e.g., the objects it's "connected" to).
I've got a script that does this, and runs the output thru out-gridview. From there, I can then add different criteria fields to help look for stuff throughout the entire pipeline collection on my "server". Kinda helpful, if not really very user-friendly.
import-module Az.Accounts
import-module Az.DataFactory
$azAcct = Connect-AzAccount -subscription 'your_subscription_name'
#$azRgName, $azDFName are "empirically determined"
enter code here
$azDFPipelines = Get-AzDataFactoryV2Pipeline -ResourceGroupName "$azRGname" -DataFactoryName "$azDFName"
###need to coerce the Name property to ADFName because it's also a member of the Activities object/property...
$adf = $azDFPipelines | select-object @{N='ADFName';E={$_.name}},`
@{N='Activities';E={$_.Activities}}
Next-level would be making it essentially walk the tree in a given pipeline or from a pipeline's specific pipeline object, and spit out graphviz "dot" or Mermaid graph language .md (or .vsdx...)
###expand Activities, and also select just a few properties from Activities:
$adflist = $adf | select ADFName -ExpandProperty Activities | select ADFName, name, description, notebookpath, additionalproperties
$adfList | out-gridview
Next-level would be making it essentially walk the tree in a given pipeline or from a pipeline's specific pipeline object, and spit out graphviz "dot" or Mermaid graph language .md (or .vsdx...)
###expand Activities, and also select just a few properties from Activities: $adflist = $adf | select ADFName -ExpandProperty Activities | select ADFName, name, description, notebookpath, additionalproperties $adfList | out-gridview
the DependsOn can be extracted from the additionalproperties collection...
the AdditionalProperties has the name of the "next" thing to be run in it, what it is, etc.
Like SSIS, execution flow through a ADF pipeline is "parallel" and are invoked non-deterministically, unless they're connected to each other serially. same goes for connected objects - they're invoked "in parallel" unless they're connected in series.
Upvotes: 0
Reputation: 17146
Brendan, our ADF is connected to git and so when I need to know what will be affected if I change the pipeline with say name somePipelineName
, I goto git bash and type out
grep --color=always -4 "somePipelineName" *
on the pipelines folder
This helps me find all places from where the pipeline may be called.
Update: 2020-09-17
I noticed today that we now have the related pipelines listing
Upvotes: 2