Maryam-AM
Maryam-AM

Reputation: 99

for-each is passing all the files at once in azure data factory

I am a newbie in Azure and I am running into a problem with for-each activity.

Scenario: Everyday 6 files must get copied from blob storage into Azure SQL.

Day 1: Vendors-01, Transactions01, Orders01, Customers01, purchases01, history01 Day 2: Vendors-02, Transactions02, Orders02, Customers02, purchases02, history02 ...

Problem: My for-each is passing all files at once multiple times instead of looping through them and passing each at a time.

in an ideal world, it should be like this

  1. 1st-loop: Vendors-files
  2. 2nd-loop: Transactions-files
  3. 3rd-loop: Orders-files
  4. 4th-loop: Customers-files
  5. 5th-loop: Purchases-files
  6. 6th-loop: History-files

What is happening currently:

  1. 1st-loop: Vendors-files, Transactions-files, Orders-files, Customers-files, Purchases-files, History-files
  2. 2nd-loop: Vendors-files, Transactions-files, Orders-files, Customers-files, Purchases-files, History-files ...

That's causing the pipeline to fail because the for-each is not working properly.

The Parent Pipeline

the parent pipline

Here is the count of the actual files 6, that's why the pipeline gets executed 6 times

enter image description here

Here is the count of each loop of the pipeline, it gives me 24 files instead of 3 similar files, or 1 at a time

enter image description here

Where am I missing a step?

Any feedback will be highly appreciated!

Thanks in advance 🙏

Upvotes: 1

Views: 2040

Answers (2)

Steve Johnson
Steve Johnson

Reputation: 8690

Maybe you can try this: use wildcard paths to copy files from Blob Storage to corresponding table in Azure SQL.

my test:

  1. create a variable which type is array and value are your table name. enter image description here

2.loop this array enter image description here

3.use wildcard paths to filter files name enter image description here

4.pass @item() to dataset as sink enter image description here

Upvotes: 1

Nandan
Nandan

Reputation: 4945

As the variable part of the job execution is the date which is appended at the end of file and not the file names , you can achieve the result by various approaches:

  1. create a config table of Dates and their processed status.

Foreach loop activity : the value for it should be the number of extractdates that we need to process (meaning the dates for which we need to copy the files)

within the foreach, create 6 copy activities (one for each file name like history ,order etc) where the source dataset needs to be dynamic to fetch files @cancat("order_number",utcnow()) some what like that.

  1. Rather than having 6 different copy activities within the FOreach activity, since ADF doesnt support foreach within foreach ; create another pipeline that would contain a variable of type Array with value as the different files names. In the new pipeline, create a forache loop with the variable as the input and within the foreach create a copy activity.

And in the original pipeline within the foreach activity(based on date parameter), call the new pipeline.

Upvotes: 0

Related Questions