simplicITy
simplicITy

Reputation: 101

How to skip open Excel files when copying a directory in Azure Data Factory

I have a data ingestion pipeline in Data Factory, which is fed a parameter. Based on this param, a Lookup activity returns file share directories along with a couple other pieces of metadata. These directories are then passed to a ForEach activity.

In this ForEach, a Copy Data activity exists. It has a source dataset (dataload_dir) and a sink dataset (our data lake). The source dataset has a dataset property, FilePattern, which is an expression: @concat('*',pipeline().parameters.FilePattern,'*'). The dataload_dir source dataset takes a FilePattern parameter and has a File Path property of @item().DS_Directory_Name / @dataset().FilePattern. The FilePattern param is part of the metadata grabbed above.

As an example, the ForEach is passed a DS_Directory_Name of \\root\directoryA\subdirectoryB\ and a FilePattern of .*x. The Copy Data activity then loads all of the Excel files within subdirectoryB to our Data Lake.

This has been working as expected, but we have now run into an issue where the pipeline errors out because a user has one of the Excel files open and therefore locked. Any ideas on a way around this? Because we have this set up as a directory loader, there's no step that outputs the files top be copied so I can't just filter out files that start with ~$. Also, this Copy Data activity is already in a ForEach so I can't add a sub ForEach activity in Data Factory. I can't use Fault Tolerance in the Copy Data activity because again, it's loading a directory not a file/table. I'm open to suggestions here.

Upvotes: 0

Views: 413

Answers (1)

Pratik Lad
Pratik Lad

Reputation: 8291

  • The ForEach activity (in this case) used to copy each Excel file from Source directory into destination with directory. In case any file is open, the copy activity usually fails.

  • But when a copy activity inside foreach fails, the pipeline execution does not fail/stop. It throws an error for that particular copy activity that fails, but it continues the execution of copy activity for remaining files.

Look at the following demonstration for clear understanding.

  • The following inspect file under f1 folder is open in my system. enter image description here

  • Inside Foreach, I have a copy activity along with wait activity (wait executes only if copy activity fails) enter image description here

  • The first iteration of copy activity fails as the f1 folder contains file that is open in my system. And yet the pipeline runs successfully without failing. When I execute this pipeline, it completes successfully without any errors. The following is the debug output. enter image description here enter image description here

When I tried to get metadata from each folder, I got four files whereas F1 folder contain only three files, but one file is already open in my system it takes that file with ~$ prefix enter image description here

For this file to copy It throws an error for that particular copy activity that fails, but it continues the execution of copy activity for remaining files.

In output I got all the Excel files copied from folder to destination except that file which is open in system. enter image description here enter image description here

Note - To iterate over folders and again on files you had to use two for each loop and for that you have to use another child pipeline inside parent pipeline.

Upvotes: 2

Related Questions