Reputation: 122
I have an ADF pipeline loading raw log data as JSON files into a Data Lake Gen 2 container.
We now want to extract information from those JSON files and I am trying to find the best way to get information from said files. I found that Azure Data Lake Analytics and U-SQL scripts are pretty powerful and also cheap, but they require a steep learning curve.
Is there a recommended way to parse JSON files and extract information from them? Would Data Lake tables be an adequate storage for this extracted information and act then as a source for downstream reporting process?
And finally, will Azure Data Factory ever be able to parse nested arrays JSONs?
Upvotes: 3
Views: 1945
Reputation: 6043
We can parse JSON files and extract information via data flow. We can parse nested arrays JSONs via Flatten transformation in mapping data flow.
Json example:
{
"count": 1,
"value": [{
"obj": 123,
"lists": [{
"employees": [{
"name": "",
"id": "001",
"tt_1": 0,
"tt_2": 4,
"tt3_": 1
},
{
"name": "",
"id": "002",
"tt_1": 10,
"tt_2": 8,
"tt3_": 1
}]
}]
}]
}
Flatten active settings and output preview:
Mapping data flow follows an extract, load, and transform (ELT) approach and works with staging datasets that are all in Azure. Currently, the following datasets can be used in a source transformation.
So I think using data flow in ADF is the easiest way to extract information and act then as a source for downstream reporting process.
Upvotes: 2