Reputation: 1781
I have many JSON files that lands in my data lake that all have the same format like this:
[
{
"AzureContainer": "someContainer",
"AzureFolderName": "someFolder",
"AzureFilename": "myFilename",
"Timestamp: "123123213",
"ItemName": "myItem",
"Value": 567
},
{
"AzureContainer": "someContainer",
...
},
...
]
What I want to do, is to use the information in these items to append to CSV files in my Data Lake (or create them if they don't exist) in the format:
[Timestamp],[ItemName],[Value]
[Timestamp],[ItemName],[Value]
....
I also need to generate the CSV filename as ddmmyyyy_[AzureFilename].csv
What I've done thus far is create a lookup that reads all of the files within my landing blob. I can then extract each of these individual JSON objects. Now, I need to use the information in the JSON object to write to the specified location in my Azure storage account (using AzureContainer, AzureFolderName and AzureFilename), and I need to create a CSV entry using Timestamp, ItemName and Value.
How can I do this? I'm trying to find something that allows me to write to a CSV without having to assign a Source datasource? Thus I want to format my CSV content, and use the Azure parameters to append the CSV file in the specified location.
Upvotes: 0
Views: 2786
Reputation: 5074
You can use dataflow
activity and loop in Foreach
loop to write source JSON files to individual CSV files.
JSON Source:
Create a source JSON dataset and pass the filename dynamically.
In the pipeline, connect the source dataset to the Get Metadata
activity to get the list of files from the source folder. In Field list, select child items. This gets all the list of files from the folder.
Output of Get Metadata:
Connect output of Get Metadata to Foreach loop:
In settings, select Sequential and in Items, add the expression to get the output child items.
@activity('Get Metadata1').output.childitems
Add dataflow activity inside Foreach loop. Inside dataflow create a parameter to pass sink filename from source.
Add source connection to the dataflow source.
Create a sink connection and add it to the dataflow sink.
In sink settings, select File name option as "Output to single file" and add an expression to generate sink filename in Filename property.
concat(toString(currentDate(),'ddMMyyyy'),"_",replace($dynamicSinkCSV, '.json', '.csv'))
In mappings, select only the columns which are required.
In Foreach
loop, at dataflow level settings, pass the value of Source filename as current item of foreach loop.
In parameters, pass the value of sink file name which is the same as source filename (current item of Foreach loop).
When you execute the pipeline you can see the loop iterated 2 times as in the source folder there are 2 files.
CSV files generated in Sink folder:
Upvotes: 2