Cristian Iosub
Cristian Iosub

Reputation: 157

USQL call to read json array without a key

I have a nested json that I am trying to flatten in usql. I cannot share the data, but the structure is similar to this.

{
    "userlist": [user1, user1],
    "objects": {
        "largeobjects": [object1, object2, object3]
        "smallobjects": [s_object1, s_object2]
    },
    "applications": [{
            "application": sdq3ds5dsa
        }, {
            "application": dksah122j4
        }, {
            "application": sadsw2dq2s
        }, {
            "application": pro3dfdsn3
        }
    ],
    "date" : 12344232,
    "timezone" : "Asia",
    "id" : "sad2ddssa2",
    "admin": {
        "lang": "eng",
        "country": "us",
    }
}

I am using the custom jsonoutputter (https://github.com/Azure/usql/tree/master/Examples/DataFormats/Microsoft.Analytics.Samples.Formats) to extract from the json file and the jsontuple function to extract the values. My problem is that the function uses sql map which generates key value pairs. This works for the situations where I have a key, but it throws an error when I try to use that function to get the values from the no key array.

Any suggestion regarding how to solve this would be greatly appreciated.

EDIT This is the output I am looking after:

sad2ddssa2, object1, 12344232, "Asia", "eng", "us",
sad2ddssa2, object2, 12344232, "Asia", "eng", "us"

Upvotes: 2

Views: 726

Answers (2)

wBob
wBob

Reputation: 14379

I was able to get this to work using the NewtonSoft MultiLevelJsonExtractor extractor and this fixed-up JSON file:

REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats]; 

USING Microsoft.Analytics.Samples.Formats.Json;

DECLARE @inputFile string = @"\input\yourInputJSON.json";
DECLARE @outputFile string = @"\output\output.csv";


@input =
    EXTRACT id string,
            largeobjects string,
            date string,
            timezone string,
            lang string,
            country string
    FROM @inputFile
    USING new MultiLevelJsonExtractor("objects", false,
          "id",
          "largeobjects",
          "date",
          "timezone",
          "admin.lang",
          "admin.country"
          );

// Convert the JSON column to SQL MAP to multiple rows
@working =
    SELECT id,
           JsonFunctions.JsonTuple(largeobjects).Values AS largeobject,
           date,
           timezone,
           lang,
           country
    FROM @input;


// Explode the JSON SQL MAP
@output =
    SELECT id,
           x.y AS largeobject,
           date,
           timezone,
           lang,
           country
    FROM @working
         CROSS APPLY
             EXPLODE(largeobject) AS x(y);


OUTPUT @output
TO @outputFile
USING Outputters.Csv(quoting : false);

My results:

My results

I would say this is probably a bit safer than using a roll-your-own method as the NewtonSoft library is specifically for manipulating JSON and is tried and tested.

Upvotes: 1

Miguel Domingues
Miguel Domingues

Reputation: 450

First option

Try to use PROSE within your u-sql. Use the PROSE's c# nuget to process data and do complex extractions. This is a very powerful AI package. See the videos and examples here: https://microsoft.github.io/prose

Second option

Create a c# function to process your json. Something like this one, adapt this sample to your custom extraction request using c# json api's:

/* Formats the array of values into a named json array. */

DECLARE @JsonArray Func<SqlArray<string>, string, string> = (data, name) => 
    {
        StringBuilder buffer = new StringBuilder();

        buffer.Append("{\r\n\t\"" + name + "\": [\r\n");

        for (int i = 0; i < data.Count(); i++)
        {
            if (i > 0)
            {
                buffer.Append(",\r\n");
            }

            buffer.Append("\t\"" + data[i] + "\"");
        }

        buffer.Append("\r\n\t]\r\n}");

        return buffer.ToString();
    };

/* Format the array containing groups of comma separated values into a named json array */

@Query = 
    SELECT
        @JsonArray(SubscriptionArray, "subscriptionList") AS JsonArray
    FROM @subscriptionsQuery1;

Third option

Try this approach, after adapting it to your needs:

/* For each json line create a json map (SqlMap) */

@subscriptionsQuery1 = 
    SELECT 
           JsonFunctions.JsonTuple(JsonLine) AS JsonMap
    FROM @SubscriptionsExtractor AS t;

/* For each json map get the required property value */

@subscriptionsQuery1 = 
    SELECT DISTINCT
           JsonMap["alias"] AS Subscription
    FROM @subscriptionsQuery1 AS t;

/* Join the value of all rows into a single row containing an array of all values */

@subscriptionsQuery1 = 
    SELECT
        ARRAY_AGG<string>(Subscription) AS SubscriptionArray
    FROM @subscriptionsQuery1 AS t;

Upvotes: 1

Related Questions