Reputation: 35
I'm trying to copy a bunch of files from a folder in Azure Data Lake Storage Gen 2 using Data Factory. I need to access the filenames of the blobs that I am copying at runtime so that I can inspect it and parse the name to retrieve important metadata. Unfortunately, I do not see a way to do this using Data Factory V2 on Azure. If anyone knows how to do this it would be greatly appreciated if you shared this with me.
Upvotes: 1
Views: 2104
Reputation: 69
Below are the steps to copy the filenames from 'Azure Data Lake Storage Gen2' at runtime.
Step 1) File name and the file type can be retrieved using the 'Get Metadata' activity. Mention the File list as 'Child items' to retrieve the metadata information of the folders and the files.
Output of 'Get Metadata' activity below
'Azure Data Lake Storage Gen2' connector is also supported in 'Get Metadata' activity.
https://learn.microsoft.com/en-us/azure/data-factory/control-flow-get-metadata-activity
Step 2) Iterate through the files using 'For Each' activity. 'For Each' activity Settings/ Items use the expression @activity('Get Metadata1').output.childItems
to retrieve the input from the 'Get Metadata' activity.
Step 3) Use 'Copy data' activity to copy the required files. Specify the respective source and the Sink setting from which source to target you would like to move the files.
Below is the complete pipeline.
Upvotes: 0
Reputation: 23792
You could get an idea of Get Metadata Activity in Azure Data Factory which can be used to retrieve metadata of any data in Azure Data Factory.
However,only below connectors are supported so far:
Since your data is stored in ADL Gen2,you could try to transfer the data from ADL Gen2 to Azure Blob Storage.Then use Metadata Activity to access the file name in folder: https://learn.microsoft.com/en-us/azure/data-factory/control-flow-get-metadata-activity#get-a-folders-metadata
Upvotes: 1