Reputation: 157
I have a nested json that I am trying to flatten in usql. I cannot share the data, but the structure is similar to this.
{
"userlist": [user1, user1],
"objects": {
"largeobjects": [object1, object2, object3]
"smallobjects": [s_object1, s_object2]
},
"applications": [{
"application": sdq3ds5dsa
}, {
"application": dksah122j4
}, {
"application": sadsw2dq2s
}, {
"application": pro3dfdsn3
}
],
"date" : 12344232,
"timezone" : "Asia",
"id" : "sad2ddssa2",
"admin": {
"lang": "eng",
"country": "us",
}
}
I am using the custom jsonoutputter (https://github.com/Azure/usql/tree/master/Examples/DataFormats/Microsoft.Analytics.Samples.Formats) to extract from the json file and the jsontuple function to extract the values. My problem is that the function uses sql map which generates key value pairs. This works for the situations where I have a key, but it throws an error when I try to use that function to get the values from the no key array.
Any suggestion regarding how to solve this would be greatly appreciated.
EDIT This is the output I am looking after:
sad2ddssa2, object1, 12344232, "Asia", "eng", "us",
sad2ddssa2, object2, 12344232, "Asia", "eng", "us"
Upvotes: 2
Views: 726
Reputation: 14379
I was able to get this to work using the NewtonSoft MultiLevelJsonExtractor extractor and this fixed-up JSON file:
REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];
USING Microsoft.Analytics.Samples.Formats.Json;
DECLARE @inputFile string = @"\input\yourInputJSON.json";
DECLARE @outputFile string = @"\output\output.csv";
@input =
EXTRACT id string,
largeobjects string,
date string,
timezone string,
lang string,
country string
FROM @inputFile
USING new MultiLevelJsonExtractor("objects", false,
"id",
"largeobjects",
"date",
"timezone",
"admin.lang",
"admin.country"
);
// Convert the JSON column to SQL MAP to multiple rows
@working =
SELECT id,
JsonFunctions.JsonTuple(largeobjects).Values AS largeobject,
date,
timezone,
lang,
country
FROM @input;
// Explode the JSON SQL MAP
@output =
SELECT id,
x.y AS largeobject,
date,
timezone,
lang,
country
FROM @working
CROSS APPLY
EXPLODE(largeobject) AS x(y);
OUTPUT @output
TO @outputFile
USING Outputters.Csv(quoting : false);
My results:
I would say this is probably a bit safer than using a roll-your-own method as the NewtonSoft library is specifically for manipulating JSON and is tried and tested.
Upvotes: 1
Reputation: 450
First option
Try to use PROSE within your u-sql. Use the PROSE's c# nuget to process data and do complex extractions. This is a very powerful AI package. See the videos and examples here: https://microsoft.github.io/prose
Second option
Create a c# function to process your json. Something like this one, adapt this sample to your custom extraction request using c# json api's:
/* Formats the array of values into a named json array. */
DECLARE @JsonArray Func<SqlArray<string>, string, string> = (data, name) =>
{
StringBuilder buffer = new StringBuilder();
buffer.Append("{\r\n\t\"" + name + "\": [\r\n");
for (int i = 0; i < data.Count(); i++)
{
if (i > 0)
{
buffer.Append(",\r\n");
}
buffer.Append("\t\"" + data[i] + "\"");
}
buffer.Append("\r\n\t]\r\n}");
return buffer.ToString();
};
/* Format the array containing groups of comma separated values into a named json array */
@Query =
SELECT
@JsonArray(SubscriptionArray, "subscriptionList") AS JsonArray
FROM @subscriptionsQuery1;
Third option
Try this approach, after adapting it to your needs:
/* For each json line create a json map (SqlMap) */
@subscriptionsQuery1 =
SELECT
JsonFunctions.JsonTuple(JsonLine) AS JsonMap
FROM @SubscriptionsExtractor AS t;
/* For each json map get the required property value */
@subscriptionsQuery1 =
SELECT DISTINCT
JsonMap["alias"] AS Subscription
FROM @subscriptionsQuery1 AS t;
/* Join the value of all rows into a single row containing an array of all values */
@subscriptionsQuery1 =
SELECT
ARRAY_AGG<string>(Subscription) AS SubscriptionArray
FROM @subscriptionsQuery1 AS t;
Upvotes: 1