Mike S
Mike S

Reputation: 190

Azure Data Flow: Parse nested list of objects from JSON String

I need to parse JSON data from a string inside a Azure Data Flow. So far, I was able to parse all my data using the "Parse" function of the Data Flows. But now I am faced with a list of objects, and I don't know how to parse the values of that "complex array".

My data is looking like this:
There are some metadata fields (here null) and a Base64 encoded Body field. Every JSON document is in a separate JSON file.

The example data:

{
    "metadata": null,
    "Body": "eyJpZCI6ICIxIiwgImNvdW50IjogMTIsICJwcm9qZWN0cyI6IFt7Imd1aWQiOiAiMTIzMTItMTI0MzE0LTEyNDEtMTIzNCIsICJzdGF0dXMiOiAic3RvcHBlZCJ9LHsiZ3VpZCI6ICJjc2lzdi1uZDkyM24tMTM0MS0yMzQxIiwgInN0YXR1cyI6ICJydW5uaW5nIn1dfQ"
}

For clarification, the encoded example data looks like this:

{
    "metadata": null,
    "Body": "{"id": "1", "count": 12, "projects": [{"guid": "12312-124314-1241-1234", "status": "stopped"},{"guid": "csisv-nd923n-1341-2341", "status": "running"}]}"
}

My goal is to have a parquet file containing the data from the Body. So there should be three columns: id, count, projects. Projects should contain a list of complex objects.

The final result should look like this:
Goal
This is the result, when I load a JSON file, where the Body data is not encoded, but plain JSON containing the list of objects. Note, that this is not feasible for the original problem, where the JSON data is Base64 encoded.

When I load the example data into a dataflow the projection looks like this (as expected):
projection

First, I need to decode the Base64 Body and then I can parse the JSON string:
data flow

Decoding Body is done here:
Decode Body

And basic parsing is done here:
enter image description here

How can I parse the field "projects"? Given that every object in the list of the array field has the same schema.

I already tried parsing the field "projects" as string and add another Parse step to parse this string as "Array of documents", but the results are only Null values..

parse projects string

result of parse projects

Upvotes: 0

Views: 4064

Answers (1)

Mike S
Mike S

Reputation: 190

The final Dataflow looks like this:
FinalDataFlow

The parsing has to be splitted in several parts.

  • First, the array needs to be parsed as a string array
  • The string array has to be exploded
  • Now, every array entry can be parsed
  • The exploded array can be collected back to gain the structure I wanted to have
  • Finally, the exploded and recollected data can be rejoined to the original data

Parse array of objects as string array

Remember: The data I want to parse looks like this:

{
    "metadata": null,
    "Body": "{"id": "1", "count": 12, "projects": [{"guid": "12312-124314-1241-1234", "status": "stopped"},{"guid": "csisv-nd923n-1341-2341", "status": "running"}]}"
}

So first I need to parse the "Body" column, which is BodyDecoded, since I first had to decode from Base64. The array of objects has to be parsed as array of strings. Hence, the "Output column type" of the Parse step looks like this:

(id as string,
count as integer,
projects as string[])

The values are written in the BodyContent column. To make the coming steps easier first the hierarchy is flattened. It is possible to use a column pattern for that, but I will do it explicitly here:
FlattenHierachy

Also, the projects column is now renamed to projectsStringArray

Explode string array

Now the projectsStringArray can be exploded using the "Flatten" step. ExplodeStringArray

The column id is also taken here, to be able to recollect the array later.

The result looks like this:
DataAfterExplodeStringArray

Parse JSON strings

Now every string can be parsed by a "Parse" step, as usual

(guid as string,
 status as string)

Collect parsed objects

The parsed objects can be aggregated in lists again, using the "collect" function. CollectArray

Rejoin to original data

To get the desired structure the collected column has to be joined to the original data. The id column can be used to join the data back.
Join

Result

After a final select, the structure looks as required:
Finally

Remarks: Thanks to Erik from Microsoft for his help! I was too focused on solving it using only the parsing step, that I didn't think about other ways to tackle the problem..

Upvotes: 1

Related Questions