Reputation: 5467
I want to transform multiple complex JSON files into one complex JSON file using Azure Data Factory dataflow.
The multiple complex input JSON files are in the following format:
{
"creationDate": "2022-01-19T17:00:17Z",
"count": 2,
"data": [
{
"id": "",
"name": "Project A",
"url": "",
"state": "Open",
"revision": 1,
"visibility": "private",
"lastUpdateTime": "2019-09-23T08:44:45.103Z"
},
{
"id": "",
"name": "Project B",
"url": "",
"state": "Done",
"revision": 1,
"visibility": "private",
"lastUpdateTime": "2019-12-31T09:38:49.16Z"
}
]
}
We want to transform those files to one single json file in the format:
[
{
"date": "2022-01-14",
"count": 2,
"projects": [
{
"name": "Project A",
"state": "",
"lastUpdateTime": ""
},
{
"name": "Project B",
"state": "",
"lastUpdateTime": ""
}
]
},
{
"date": "2022-01-17",
"count": 3,
"projects": [
{
"name": "Project A",
"state": "",
"lastUpdateTime": ""
},
{
"name": "Project B",
"state": "",
"lastUpdateTime": ""
},
{
"name": "Project C",
"state": "",
"lastUpdateTime": ""
}
]
}
]
We were using the derived column with the expression @(name=data.name, state=data.state)
.
Can someone help us how to do this? We tried a lot of things like derived column, first flattening but we can't get it as we like...
Thanks!
Upvotes: 0
Views: 121
Reputation: 5467
The solution on the end was pretty close on what we had.
So our final solution is as follow:
flatten
with Unroll by
set to data
. We also mapped the creationDate
to date
.derived column
called projects
with as expression @(name=name,state=state,lastUpdateTime=lastUpdateTime,url=url)
Group
activity with a Group by
on date
. Set Aggregates
for count
on first(count)
and (this was the solution) set projects
to collect(projects)
.Select
activity which will select the columns date
, count
and projects
.Sort
activity with a sort on date
Ascending.file name option
to output to single file
and partion option
set to single partion
Note:
Because we have a sink to one big json file (output to single file). The sorting wasn't correct written to json. If we debugged (data preview) the dataflow everything was correct. Strange behavior. When we changed the Sort
activity the option Partion option
to Single partion
the json file had the right sort order.
Upvotes: 1