ZCoder
ZCoder

Reputation: 2349

Azure Data flow taking mins to trigger next pipeline

Azure Data factory transferring data in Db in 10 millisecond but the issue I am having is it is waiting for few mins to trigger next pipeline and that ends up with 40 mins all pipelines are taking less than 20 ms to transfer data. But somehow it is waiting a few mins to trigger the next one.

enter image description here I used debug mode as well trigger the ADF using Logic App without debugging mood. Is there any way I can optimize it we want to move from SSIS to Data Flow but having a time issue 40 mins are so much in next step we have millions of records

so it took 7 seconds to transfer data to dataBase but it waited for 6 mins :( check the image below

enter image description here

enter image description here

Upvotes: 1

Views: 1973

Answers (2)

Mark Kromer MSFT
Mark Kromer MSFT

Reputation: 3838

You will hit the Databricks cluster spin-up time during job (triggered) execution.

As long as you are in Debug mode, you'll always hit a warmed cluster while the debug session is still green.

We've added TTL to the Azure IR in the Data Flow configuration section so that you can keep a cluster alive for your next data flow activity and you won't incur the start-up penalty on your next execution.

Note that option is greyed out at this time, but will enable it soon.

Upvotes: 2

Leon Yue
Leon Yue

Reputation: 16431

This document Monitor data flow performance mentioned that:

Note that you can assume 1 minute of cluster job execution set-up time in your overall performance calculations and if you are using the default Azure Integration Runtime, you may need to add 5 minutes of cluster spin-up time as well.

That's maybe a reason. You can first follow this tutorial Mapping data flows performance and tuning guide.

This document Execute data flow activity in Azure Data Factory also can help us improve the performance.

Choose the compute environment for this execution of your data flow. The default is the Azure Auto-Resolve Default Integration Runtime. This choice will execute the data flow on the Spark environment in the same region as your data factory. The compute type will be a job cluster, which means the compute environment will take several minutes to start-up.

You have control over the Spark execution environment for your Data Flow activities. In the Azure integration runtime are settings to set the compute type (general purpose, memory optimized, and compute optimized), number of worker cores, and time-to-live to match the execution engine with your Data Flow compute requirements. Also, setting TTL will allow you to maintain a warm cluster that is immediately available for job executions.

Note:

The Integration Runtime selection in the Data Flow activity only applies to triggered executions of your pipeline. Debugging your pipeline with Data Flows with Debug will execute against the 8-core default Spark cluster.

Hope this helps.

Upvotes: 3

Related Questions