Reputation: 77
I will get straight to the point. This is the problem:
I have an Azure storage account with Blob storage in which I have multiple containers. In these containers, I do have a "folder-like structure" made up of directories and subdirectories (I guess this would be proper terminology for it because in the dataset I do have field with "Directory" right after container as you can see in the picture.
The structure is following(for simplicity I will make it shorter but still representative):
I need to get Metadata from the CSV files (particularly name of the file) so I can add aditional logic to the pipeline so it knows what files to copy. What is the best solution to get these filenames?
I have tried to use For Each statement. First of all I created Dateset where I only specified the container name and I used it in the Get Metadata activity where I got output in form of list of years (I listed childitems). Then I created another Dataset but this time parametrized where I defined directory as @dataset().FileName
(I did not define the file name). I used this dataset in the For Each loop with Get Matadata activity where I was able to get list of numbers of months like you can see in the file structure above. Then I went on to create third dataset(I thought this was already dumb but I gave it a shot) where I wanted to include two parameters in the directory field which would be concatenated. Here I found out that I could not use the parameter of previous dataset in another dataset. So i thought maybe I could use variable... I was not able to use this also because I got error everytime I wanted to use variable in "Add dynamic content". So then I tried to use dataset where I defined only container and file name but I ended up with getting results only for default value set for file name at the dataset level.
Since I am quite new to ADF and creating pipelines I wonder what am I missing. What would be your proposed solution to get the file names of the CSV docs so I can use them later on within the pipeline?
Upvotes: 0
Views: 4622
Reputation: 5074
I have repro’d by iterating through multiple sub-folders inside For Each activity using execute pipeline activity.
Source dataset:
Create a dataset for the source and add the dataset parameter for passing the value dynamically.
Main pipeline:
Get Metadata
activity, get the folders inside the given container.ForEach
activity. Inside ForEach, add execute pipeline
to call another pipeline to get the subfolder for each current item (@item().name
).Child pipeline1 (to get the subfolders):
Get Metadata
activity, get the subfolders list. Use the parameters in the dataset.Dataset property value: @concat(pipeline().parameters.dir1,'/')
@concat(pipeline().parameters.dir1,'/',item().name,'/')
Child pipeline2 (gets the files and process):
Pass the output child items to ForEach activity. Inside ForEach, you can use filter activity to filter out the files.
Using Copy data activity to can copy the required files to the sink.
Dataset properties:
Dir - @concat(pipeline().parameters.path,'/',item().name)
Upvotes: 1