OreoFanatics
OreoFanatics

Reputation: 898

Iterate each folder in Azure Data Factory

In our DataLake storage, we received unspecified amount of folders every day. Each of these folders contain at least one file.

Example of folders:

    FolderA

    |_/2020

       |_/03

          |_/12

              |_fileA.json

        |_/04

           |_/13

               |_fileB.json

    FolderB

    |_/2020

       |_/03

          |_/12

              |_fileC.json
Folder C/...
Folder D/...
So on..

Now: 1. How do I iterate every folders and get the file(s) inside it?

  1. I would also like to do 'Copy Data' from each of these files and make a single .csv file from it. What would be the best approach to achieve it?

Upvotes: 0

Views: 569

Answers (1)

Martin Esteban Zurita
Martin Esteban Zurita

Reputation: 3209

This can be done with a single copy activity using wildcard filtering in the source dataset, as seen here: https://azure.microsoft.com/en-us/updates/data-factory-supports-wildcard-file-filter-for-copy-activity/

Then in the sink tab of the copy activity, select Merge Files in the Copy behavior as seen here:

Merge files

If you have extra requirements, another way to do this is by using Mapping Dataflows. Mark Kromer explains a similar scenario here: https://kromerbigdata.com/2019/07/05/adf-mapping-data-flows-iterate-multiple-files-with-source-transformation/

Hope this helped!

Upvotes: 1

Related Questions