Krumelur
Krumelur

Reputation: 33048

How to flatten a nested JSON array in Azure Data Factory and output the original JSON object as string?

My input (simplified) is coming from many JSON files structured like this:

{
  Type : "Root",
  Id: "R1",
  Nested : [
    { Type : "NestedType1", Id: "N1", SharedAttribute: 1 },
    { Type : "NestedType1", Id: "N2", SharedAttribute: 2 },
    { Type : "NestedType2", Id: "N3", SharedAttribute: 3, NestedType2SpecificAttribute = "foo" }
  ]
}

The important bit is that the nested elements share certain attributes (eg SharedAttribute), but they can have various other attributes (eg NestedType2SpecificAttribute). I cannot capture all attributes in the input schema as they are changing over time.

I want the nested array to be transformed so that it outputs a Kusto table with all common/shared attributes and an additional column with a string representing the nested array's items' JSON.

I'm using a DataFlow to read these JSON files from Data Lake. To extract the array, I added a "Flatten" formatter and selected to unroll by "Nested" and made the unroll root the same. This gives me the expected data:

Type Id SharedAttribute
NestedType1 N1 1
NestedType1 N2 2
NestedType2 N3 3

But I have no idea how to add an additional column called "RawJson" which should contain the source element of the unrolled array, essentially toString(currentItem) to produce a result like this:

Type Id SharedAttribute RawJson
NestedType1 N1 1 { Type : "NestedType1", Id: "N1", SharedAttribute: 1 }
NestedType1 N2 2 { Type : "NestedType1", Id: "N2", SharedAttribute: 2 }
NestedType2 N3 3 { Type : "NestedType2", Id: "N3", SharedAttribute: 3, NestedType2SpecificAttribute = "foo" }

Upvotes: 0

Views: 3059

Answers (2)

Rakesh Govindula
Rakesh Govindula

Reputation: 11234

Adding to @Mark Kromer MSFT, after flatten transformation, you will get the array of objects in the output.

additional column with a string representing the nested array's items' JSON.

Then use derived column transformation to get the output as String.

Flatten:

enter image description here

Flatten Output as array of objects:

enter image description here

Derived column:

enter image description here

Output:

enter image description here

Upvotes: 2

Mark Kromer MSFT
Mark Kromer MSFT

Reputation: 3838

In the Flatten transformation's "Input Columns" section, you can include the original array. You should be able to choose the array name that you set to unroll by and you can then rename it to "RawJson" in the Input Columns using the Name As property.

In my example, "coordinates" is the array that I unrolled as well as generated it as part of the output.

enter image description here

Upvotes: 1

Related Questions