Reputation: 3
In a Synapse Analytics pipeline, I'm running a simple SQL script like below (the output returns currently less than 100 rows):
SELECT * FROM myTable AS S RIGHT OUTER JOIN CHANGETABLE(CHANGES myTable,1) AS CT ON S.pk=CT.pk
The purpose is to output the result on a Storage Gen2 as a parquet file. Obviously, I implemented a Copy activity with my SLQ query as a source (pointing on an Azure SQL DB), and the Storage Gen2 as a sink. It works fine.
However, I noticed that the activity spent about 1min40s in a waiting queue, before to be processed in a few seconds.
For testing purpose, I created a Script activity and pasted the same SQL query in it. It runs in 8s, without queue.
What can explain such a difference and is there a way to improve this? (I don't see another way than a Copy activity to get my output stored as a parquet file, but if there is a workaround, I'm ok to try!)
thanks
Upvotes: 0
Views: 543
Reputation: 3
Answering because the problem is solved, thanks to @Pratik Lad and this article: https://asankap.wordpress.com/2021/10/26/why-you-shouldnt-use-auto-resolve-integration-runtime-in-azure-data-factory-or-synapse/
The trick is to use a custom Azure Integration Runtime, instead of the default AutoResolveIntegrationRuntime. The custom one can be configured and the main bottleneck was the Dataflow runtime "Time to live" value set to 0 by default in the AutoResolveIntegrationRuntime. The connection was closing each time before to be reopened (I forgot to mention that the Copy activity was running multiple times within my pipeline, because of a foreach activity, and at each run, the IR connection was closed, then reopened instead of being maintained open) thanks for your help @PratikLad!
Upvotes: 0