user17843794
user17843794

Reputation: 23

Is there a way to export Metadata from Azure Datafactory to a CSV file?

I intend to export a list of files in a ADLS folder to a CSV file using Azure DataFactory.

For instance, I have the following folder structure.

ADLS > FOLDER1 > File1
                 File2
                 File3

Now, I am using the Get Metadata activity in Azure DataFactory to get the child items.

{
        "childItems": [
            {
                "name": "DemoFile1",
                "type": "File"
            },
            {
                "name": "DemoFile2",
                "type": "File"
            } ],
       "effectiveIntegrationRuntime": "AutoResolveIntegrationRuntime",
       "executionDuration": 0
}

I want to export this output into a CSV file. Is there a way?

Upvotes: 1

Views: 2277

Answers (1)

NiharikaMoola
NiharikaMoola

Reputation: 5074

There are a couple of options to store the list of source file names into CSV.

Option1:

As mentioned by @Nick.McDermaid in the comments section, you can use flatten transformation in data flow bypassing the get metadata output to data flow activity in the pipeline as I have repro’d in my lab.

Input:

enter image description here

  1. Using Get Metadata activity get the list of files from a folder.

enter image description here

  1. Create an array variable and a string variable to store the values.

enter image description here

  1. Pass the Get Metadata output child items to the ForEach activity. Add append variable activity inside Foreach activity to add all the file names to an array variable

@activity('Get Metadata1').output.childItems

enter image description here

@item().name

enter image description here

  1. Convert the array variable to a string value and store it in the string variable created earlier.

@join(variables('file_list'),',')

enter image description here

Set variable output:

enter image description here

  1. Create a data flow to flatten and add it after the set variable.

enter image description here

Data Flow:

  1. First create a parameter inside data flow to store the pipeline variable.

enter image description here

  1. Connect the source to a dummy file.

enter image description here

  1. Add derived column transformation to convert the string parameter value to an array.

split($get_metadata_output, ',')

enter image description here

  1. Add flatten transformation after derived column and flatten the metadata column which is created in the derived column. Add mappings of metadata column under Input columns (this removes any columns extracted from dummy file).

enter image description here

  1. Add sink transformation and connect to sink dataset. In settings, you can provide the sink file name.

enter image description here

Pipeline:

  1. In pipeline, add data flow which is created above and pass the set variable value to the data flow parameter.

enter image description here

Output:

enter image description here

Option2:

Input:

enter image description here

  1. Connect data flow source to source dataset and provide source folder path from which folder you want to get the list of files. Do not provide the file name. In this way, it pulls all files data at once.

enter image description here

  1. In Source options, give a new column name to store the file name ‘Column to store file name’ property.

enter image description here

  1. In the Source data preview, you can see the new column file name with the file path along with data from all the files from the folder.

enter image description here

  1. Add Select transformation to the source output to remove all the other columns from the source except the file_name.

enter image description here

  1. Add derived column after select transformation to extract file name from the path. (column which stored file name contains the full path of the file)

reverse(dropRight(reverse(File_name),instr(reverse(File_name),'/')-2))

enter image description here

  1. Add aggregate transformation to get the distinct record values from the File_name column.

enter image description here

enter image description here

  1. Add Sink transformation at the end and connect to sink data set. In setting, you can provide the file name to store the sink data.

enter image description here

Sink preview:

enter image description here

  1. Output:

enter image description here

Upvotes: 3

Related Questions