venus
venus

Reputation: 1258

Azure Data Factory: Dynamic path value for the Storage Event Trigger

I have created an azure data factory pipeline to copy the data from one adls container to another adls container using copy data activity. This copy activity will trigger using a storage event trigger.

So whenever a new file gets generated, it will trigger the activity.
The source file is located in a nested directory structure having dynamic folders such as year, month, and day, which vary based on date.

In the trigger, I mentioned the path until the fixed folder path, but I don't know what value I should put for the dynamic path.
Initially, I provided the path such as my/fixed/directory/*/*/*/,
but at the time of execution, it throws the exception 'PathNotFound'.

So my question is - How can I provide the path to the storage event trigger with the dynamic folder structure? Following is ADF copy data pipeline screenshot:
Pipeline- enter image description here

Copy data activity source configuration- enter image description here

Copy data activity target configuration- enter image description here

Copy data activity source dataset configuration- enter image description here

Copy data activity target dataset configuration- enter image description here

Storage event configuration- enter image description here

Upvotes: 1

Views: 2455

Answers (1)

Saideep Arikontham
Saideep Arikontham

Reputation: 6124

  • Wildcards are not supported for blob path begins with or blob path ends with in storage event triggers.
  • However, creating a storage event trigger on the fixed parent directory would trigger the pipeline for any file created/deleted in child directories as well.
  • Let's say I have the folder structure as shown below where input/folder/2022 is my fixed directory (input is container name). I also have sub folders within each of the folders shown below.

enter image description here

  • Now, I have created a copy data activity. The folder name and file name dynamic content for source dataset is shown below (parameter values will be passed from pipeline):
folder path:  @replace(dataset().folder_name,'input/','')

file name:  @dataset().file_name

enter image description here

  • The folder name and file name dynamic content for sink dataset is shown below. This is a different container named data:
folder path: @concat('output/',replace(dataset().folder,'input/folder/',''))

file name: @dataset().file

enter image description here

  • After configuring the copy activity is done, create a storage event trigger.

enter image description here

  • Here, the values from pipeline parameters folderName and fileName will be set while creating trigger as shown below:
fileName : @triggerBody().fileName
folderName : @triggerBody().folderPath

enter image description here

  • After you attach the trigger and create a pipeline, when ever any file is uploaded to any folder within the fixed directory folder/2022 the pipeline will be triggered.
  • I have uploaded a file to folder/2022/03/01/sample1.csv. This triggered the pipeline successfully.

enter image description here

  • The file is successfully copied as well. The following is an image for reference:

enter image description here

So, creating a storage event trigger for just the parent directory is sufficient to be able to trigger the pipeline for any file uploaded to child directories as well.

Upvotes: 2

Related Questions