Mr incognito
Mr incognito

Reputation: 13

Nested ForEach in ADF (Azure Data Factory)

Pipeline ADFI have a set of json files that I want to browse, in each file there is a field that contains a list of links that direct to an image. The goal is to download each image from the links using binary formats (I tested with several links and it already works). Here, my problem is to make the nested ForEach, I manage to browse all the json files but when I make a second ForEach to browse the links and make a copy data to download the images using an Execute Pipeline I get this error

"ErrorCode=InvalidTemplate, ErrorMessage=cannot reference action 'Copy data1'. Action 'Copy data1' must either be in 'runAfter' path, or be a Trigger"

Example of file:

t1.json

{
   "type": "jean",
   "image":[
      "pngmart.com/files/7/Denim-Jean-PNG-Transparent-Image.png",
      "https://img2.freepng.fr/20171218/882/men-s-jeans-png-image-5a387658387590.0344736015136497522313.jpg",
      "https://img2.freepng.fr/20171201/ed5/blue-jeans-png-image-5a21ed9dc7f436.281334271512172957819.jpg"
   ]
}

t1.json

{
   "type": "socks",
   "image":[ "https://upload.wikimedia.org/wikipedia/commons/thumb/5/52/Fun_socks.png/667px-Fun_socks.png",
      "https://upload.wikimedia.org/wikipedia/commons/e/ed/Bulk_tube_socks.png",
      "https://cdn.picpng.com/socks/socks-face-30640.png"
   ]
}

Do you have a solution?

Thanks

Upvotes: 1

Views: 6237

Answers (2)

NiharikaMoola
NiharikaMoola

Reputation: 5074

I have repro’d and was able to copy all the links looping the copy data activity inside the ForEach activity and using the execute pipeline activity.

Parent pipeline:

  1. If you have multiple JSON files, get the files list using the Get Metadata activity.

enter image description here

enter image description here

  1. Loop the child items using the ForEach activity and add the execute pipeline activity to get the data from each file by passing the current item as a parameter (@item().name).

enter image description here

Child pipeline:

  1. Create a parameter to store the file name from the parent pipeline.

enter image description here

  1. Using the lookup activity, get the data from the current JSON file.

Filename property: @pipeline().parameters.filename

enter image description here

enter image description here

Here I have added https:// to your first image link as it is not validating in the copy activity and giving an error.

  1. Pass the output to the ForEach activity and loop through each image value.

@activity('Lookup1').output.value[0].image

enter image description here

  1. Add Copy data activity inside ForEach activity to copy each link from source to sink.

  2. I have created a binary dataset with the HttpServer linked service and created a parameter for the Base URL in the linked service.

  3. Passing the linked service parameter value from the dataset.

enter image description here

  1. Pass the dataset parameter value from the copy activity source to use the current item (link) in the linked service.

enter image description here

Upvotes: 0

wBob
wBob

Reputation: 14399

As per the documentation you cannot nest For Each activities in Azure Data Factory (ADF) or Synapse Pipelines, but you can use the Execute Pipeline activity to create nested pipelines, where the parent has a For Each activity and the child pipeline does too. You can also chain For Each activities one after the other, but not nest them.

Excerpt from the documentation:

Limitation Workaround
You can't nest a ForEach loop inside another ForEach loop (or an Until loop). Design a two-level pipeline where the outer pipeline with the outer ForEach loop iterates over an inner pipeline with the nested loop.

Or visually:

enter image description here

It may be that multiple nested pipelines is not what you want in which case you could pass this looping off to another activity, eg Stored Proc, Databricks Notebook, Synapse Notebook (if you're in Azure Synapse Analytics) etc. One example here might be to load up the json files into a table (or dataframe), extract the filenames once and then loop through that list, rather than each file. Just an idea.

Upvotes: 1

Related Questions