Reputation: 1613
I want to create an Azure Batch Activity in my Data Factory Pipe, I set up a trigger that checks for new "last modified" blobs in the last 24 hrs.
As I'm dealing with big files I want to leverage the power of Azure Batch and multiprocess 2 blobs at a time in the same machine.
This is the pipe I've done so far:
The second activity manipulate the output of the previous one by creating a list variable of {container name}/{blob}.
How can I divide my blob addresses in little batches so that I can feed them to the next batch activity?
Thanks
Upvotes: 1
Views: 131
Reputation: 14379
The 'ForEach' activity by default runs in parallel so it will spin up at least 20 threads by default and up to 50 depending on your input process. Make sure the 'Sequential' box on your ForEach is unchecked:
If you need to group up into larger groups eg 3 per batch, 5 per batch then that could be a bit more tricky and I would be looking eg a Stored Proc activity, a Databricks notebook or a Synapse Notebook to do that slightly more complex work for me.
Upvotes: 1