Vitali Dedkov
Vitali Dedkov

Reputation: 318

GetMetadata to get the full file directory in Azure Data Factory

I am working through a use case where I want to load all the folder names that were loaded into an Azure Database into a different "control" table, but am having problems with using GetMetadata activity properly.

The purpose of this use case would be to skip all of the older folders (that were already loaded) and only focus on the new folder and get the ".gz" file and load it into an Azure Database. Oh a high level I thought I would use GetMetadata activity to send all of the folder names to a stored procedure. That stored procedure would then load those folder names with a status of '1' (meaning successful).

That table would then be used in a separate pipeline that is used to load files into a database. I would use a Lookup activity to compare against already loaded folders and if one of them don't match then that would be the folder to get the file from (the source is an S3 bucket).

The folder structure is nested in the YYYY/MM/DD format (ex: 2019/12/27 where each day a new folder is created and a "gz" file is placed there).

I created an ADF pipeline using the "GetMetadata" activity pointing to the blob storage that has already had the folders loaded into it.

enter image description here

However, when I run this pipeline I only get the top three folder names: 2019, 2018, 2017.

enter image description here

Is it possible to to not only get the top level folder name, but go down all the way down to day level? so instead of the output being "2019" it would be "2019/12/26" and then next one would be "2019/12/27" plus all of the months and days from 2017 and 2018.

If anyone faced this issue any insight would be greatly appreciated.

Thank you

Upvotes: 3

Views: 10115

Answers (2)

S.Volki
S.Volki

Reputation: 66

you can also use a wildcard placeholder in this case, if you have a defined and nonchanging folder structure.

Use as directory: storageroot / * / * / * / filename

For example I used csvFiles / * / * / * / * / * / * / *.csv to get all files that have this structure:

csvFiles / topic / subtopic / country / year / month / day

example for wildcards in data source path

Then you get all files in this folder structure.

Upvotes: 4

Jay Gong
Jay Gong

Reputation: 23792

Based on the statements in the Get-Metadata Activity doc,childItems only returns elements from the specific path,won’t include items in subfolders.

enter image description here

I supposed that you have to use ForEach Activity to loop the childItems array layer by layer to flatten all structure. At the same time,use Set Variable Activity to concat the complete folder path. Then use IfCondition Activity,when you detect the element type is file,not folder,you could call the SP you mentioned in your question.

Upvotes: 3

Related Questions