SUMguy
SUMguy

Reputation: 1673

Azure Data Factory - traverse JSON array with multiple rows

I have a REST API that outputs JSON data similar to this example:

{
    "GroupIds": [
        "1234",
        "2345",
        "3456",
        "4567"
    ],
    "Id": "w5a19-a493-bfd4-0a0c8djc05",
    "Name": "Test Item",
    "Description": "test item description",
    "Notes": null,
    "ExternalId": null,
    "ExpiryDate": null,
    "ActiveStatus": 0,
    "TagIds": [
        "784083-4c77-b8fb-0135046c",
        "86de96-44c1-a497-0a308607",
        "7565aa-437f-af36-8f9306c9",
        "d5d841-1762-8c14-d8420da2",
        "bac054-2b6e-a19b-ef5b0b0c"
    ],
    "ResourceIds": []
}

Using ADF, I want to parse through this JSON object and insert a row for each value in the GroupIds array along with the objects Id and Name... So ultimately the above JSON should translate to a table like this:

GroupID Id Name
1234 w5a19-a493-bfd4-0a0c8djc05 Test Item
2345 w5a19-a493-bfd4-0a0c8djc05 Test Item
3456 w5a19-a493-bfd4-0a0c8djc05 Test Item
4567 w5a19-a493-bfd4-0a0c8djc05 Test Item

Is there some configuration I can use in the Copy Activity settings to accomplish this?

enter image description here

Upvotes: 2

Views: 3216

Answers (2)

wBob
wBob

Reputation: 14379

I tend to use a more ELT pattern for this, ie passing the JSON to a Stored Proc activity and letting the SQL database handle the JSON. This assumes you already have access to a SQL DB which is very capable with JSON.

A simplified example:

DECLARE @json NVARCHAR(MAX) = '{
    "GroupIds": [
        "1234",
        "2345",
        "3456",
        "4567"
    ],
    "Id": "w5a19-a493-bfd4-0a0c8djc05",
    "Name": "Test Item",
    "Description": "test item description",
    "Notes": null,
    "ExternalId": null,
    "ExpiryDate": null,
    "ActiveStatus": 0,
    "TagIds": [
        "784083-4c77-b8fb-0135046c",
        "86de96-44c1-a497-0a308607",
        "7565aa-437f-af36-8f9306c9",
        "d5d841-1762-8c14-d8420da2",
        "bac054-2b6e-a19b-ef5b0b0c"
    ],
    "ResourceIds": []
}'

SELECT
    g.[value] AS groupId,
    m.Id,
    m.[Name]
FROM OPENJSON( @json, '$' )
WITH
    (
    Id          VARCHAR(50) '$.Id',
    [Name]      VARCHAR(50) '$.Name',
    GroupIds    NVARCHAR(MAX) AS JSON
    ) m
    CROSS APPLY OPENJSON( @json, '$.GroupIds' ) g;

You could convert this to a stored procedure where @json is the parameter and convert the SELECT to an INSERT.

My results:

My results

I worked through a very similar example with more screenprints here which is worth a look. It's a different pattern to using Mapping Data Flows but if you already have SQL available then it makes sense to use it rather than fire up separate compute with duplicate cost. If you are not logging to a SQL DB or have access to one, then Mapping Data Flows approach might make sense to you.

Upvotes: 3

Mohana B C
Mohana B C

Reputation: 5487

You can use Data flow activity to get desired result.

First add the REST API source then use select transformer and add required columns.

After this select Derived Column transformer and use unfold function to flatten JSON array.

enter image description here

enter image description here

Another way is to use Flatten formatter.

enter image description here enter image description here

Upvotes: 4

Related Questions