Mariya
Mariya

Reputation: 65

Copy files of different formats in different folders based using Azure Data Factory

I am new to Azure Data Factory and I am trying to solve a particular use case. I have to copy files from source folder to target folder both of which are in the same storage account. The files in the source folder are of different format (csv, txt, xml) and have date appended at the end, eg: addresses_2020-11-01.csv (date format: yyyy-mm-dd)

I have to create a pipeline that will sort and store files in the dynamic folders in this hierarchy: ex: csv->yyyy->mm->dd. My understanding is first I have to filter the files into different formats and then use split function to split the substring where there is _ and then dynamically create the folder based on the year, month, and day in the filename. Below is the screenshot of the pipeline that I have created so far: I am not able to display the screenshot but the link opens the screenshot.

[Pipeline to filter files, and copy to the destination folder]

What I have done:

  1. Use Get Metadata to extract childitems
  2. Filter the output from Get Metadata into csv, txt, and xml files
  3. Use For each activity that contains a Copy activity. This activity copies files from filter activity into respective folders (csv, txt..) since the wildcard contains *.txt, *.csv, *.xml

I am not sure what is the correct way to move forward once the files are filtered so that dynamic folders are created based on the dates in the filename. I think I need to use set Variable activity along with copy activity but not sure how to accomplish this. Any help will be appreciated.

Thanks!!

Upvotes: 1

Views: 1771

Answers (1)

Steve Johnson
Steve Johnson

Reputation: 8680

If you only want to copy files, there is no need to use different format. You can just use Binary format. Something like this:

Step:

1.Use Get Metadata to extract childitems enter image description here

2.Use For each activity that contains a Copy activity copy to different folder.

expression:@activity('Get Metadata1').output.childItems enter image description here

Copy activity source: enter image description here

Source dataset: enter image description here

Copy activity sink: enter image description here

Sink dataset:

expression:@concat(split(item().name,'.')[1],'/',split(split(item().name,'_')[1],'-')[0],'/',split(split(item().name,'_')[1],'-')[1],'/',split(split(split(item().name,'_')[1],'-')[2],'.')[0])(this works for your eg file name: addresses_2020-11-01.csv) enter image description here

Files in Source folder:

enter image description here

Result:

enter image description here

Upvotes: 2

Related Questions