Sindhur
Sindhur

Reputation: 11

Azure Datafactory Pipeline query

I have an Azure Data Factory requirement. There are 50 csv files and each file is named like Product, Department, Employee, Sales, etc. Each of these files has a unique number of columns. In Azure SQL Database, I have 50 tables like Product, Department, Employee, Sales, etc. The columns of each table match with its corresponding file. Every day, I receive a new set of files in an Azure Data Lake Storage Gen 2 folder at 11 PM CST. At 12:05 AM CST, each of these files should be loaded into its respective table.

There should be only one pipeline or there can be 2 pipelines where the parent pipeline collects the metadata of the file and supplies it to the child pipeline which does the data load. It should find the files with the timestamp of the previous day and then it should loop through these files and load them into its respective target table, one by one. Can someone briefly explain the Activities and Transformations I need to use to fulfil this requirement.

I am new to ADF. I haven't tried anything so far.

Upvotes: 0

Views: 124

Answers (1)

Rakesh Govindula
Rakesh Govindula

Reputation: 11539

Each of these files has a unique number of columns. In Azure SQL Database, I have 50 tables like Product, Department, Employee, Sales, etc. The columns of each table match with its corresponding file.

As you have same columns for both source and target and same names for files and tables. The below process will work for you if you have same schema for both.

First Use Get Meta data activity for the source folder to get the files list.

To get latest uploaded files, use Filter by last modified option in the Get meta data. This option only supports UTC time format and CST equals UTC-6. Give the Start time and End time as per your requirement by cross checking both timezones. Use appropriate Date time functions for it.

For sample I have given like below.

enter image description here

which will give the result array like this.

enter image description here

Give this ChildItems array @activity('Get Metadata1').output.childItems to a ForEach activity. Inside ForEach use a copy activity to copy each file iteration wise.

Create another Source dataset and create a dataset parameter (Here sourcefiename) and give it like below.

enter image description here

Give this to copy activity source and assign @item().name for the parameter value.

enter image description here

In sink, create a Database dataset with two dataset parameters schema and table_name. Use these like below.

enter image description here

from @item().name extract the file name other '.csv' text by using split and give that to the above parameter. @split(item().name,'.c')[0]

enter image description here

Now, schedule this pipeline as per your time zone using schedule trigger.

enter image description here

Upvotes: 1

Related Questions