Cornel Verster
Cornel Verster

Reputation: 1781

Azure Data Factory - Reading JSON Array and Writing to Individual CSV files

I have many JSON files that lands in my data lake that all have the same format like this:

[
  {
    "AzureContainer": "someContainer",
    "AzureFolderName": "someFolder",
    "AzureFilename": "myFilename",
    "Timestamp: "123123213",
    "ItemName": "myItem",
    "Value": 567
  },
  {
    "AzureContainer": "someContainer",
    ...
  },
  ...
] 

What I want to do, is to use the information in these items to append to CSV files in my Data Lake (or create them if they don't exist) in the format:

[Timestamp],[ItemName],[Value]
[Timestamp],[ItemName],[Value]
....

I also need to generate the CSV filename as ddmmyyyy_[AzureFilename].csv

What I've done thus far is create a lookup that reads all of the files within my landing blob. I can then extract each of these individual JSON objects. Now, I need to use the information in the JSON object to write to the specified location in my Azure storage account (using AzureContainer, AzureFolderName and AzureFilename), and I need to create a CSV entry using Timestamp, ItemName and Value.

How can I do this? I'm trying to find something that allows me to write to a CSV without having to assign a Source datasource? Thus I want to format my CSV content, and use the Azure parameters to append the CSV file in the specified location.

Upvotes: 0

Views: 2786

Answers (1)

NiharikaMoola
NiharikaMoola

Reputation: 5074

You can use dataflow activity and loop in Foreach loop to write source JSON files to individual CSV files.

JSON Source:

enter image description here

Create a source JSON dataset and pass the filename dynamically.

enter image description here

In the pipeline, connect the source dataset to the Get Metadata activity to get the list of files from the source folder. In Field list, select child items. This gets all the list of files from the folder.

enter image description here

Output of Get Metadata:

enter image description here

Connect output of Get Metadata to Foreach loop:

In settings, select Sequential and in Items, add the expression to get the output child items.

@activity('Get Metadata1').output.childitems

enter image description here

Add dataflow activity inside Foreach loop. Inside dataflow create a parameter to pass sink filename from source.

enter image description here

Add source connection to the dataflow source.

enter image description here

Create a sink connection and add it to the dataflow sink.

enter image description here

In sink settings, select File name option as "Output to single file" and add an expression to generate sink filename in Filename property.

concat(toString(currentDate(),'ddMMyyyy'),"_",replace($dynamicSinkCSV, '.json', '.csv'))

enter image description here

In mappings, select only the columns which are required.

enter image description here

In Foreach loop, at dataflow level settings, pass the value of Source filename as current item of foreach loop.

enter image description here

In parameters, pass the value of sink file name which is the same as source filename (current item of Foreach loop).

enter image description here

When you execute the pipeline you can see the loop iterated 2 times as in the source folder there are 2 files.

enter image description here

CSV files generated in Sink folder:

enter image description here

Upvotes: 2

Related Questions