Genovedo
Genovedo

Reputation: 11

SNOWFLAKE SQL, FLATTEN path parameter for Array Nested in first level of Variant

There is a Variant field "events" that is made up of an Array storing Objects(aka dictionaries, key-value pairs), as per the below:

[
{ "field_name": "status",  "id": 987418431597, "previous_value": "new", "type": "Change", "value": "pending"},
{ "field_name": "360020024138", "id": 987418431617, "previous_value": null, "type": "Change", "value": "#55927" },
{ "field_name": "360016698218", "id": 987418431637, "previous_value": null, "type": "Change", "value": "0681102386"},
{ "field_name": "360016774537", "id": 987418431657, "previous_value": null, "type": "Change", "value": "89367031562011632212"}
]

This field belongs to an event log, and I am trying to use the content of "events" as a filter to get the related timestamps and ids.

Through the Snowflake documentation on Flatten I found out that the recursive => True parameter allows me to expand the Variant all the way down to its nested objects, but with hopes of optimising code, I wanted to use the path parameter, to selectively expand "events" only for the Objects I was interested in.

However, for some reason, Flatten does not allow me to pass a numeric path to identify the Array index of Object that I want to expand, as:

select b.* 
from "event_log" a
    ,lateral flatten (input => a."events", path => 0) b limit 100;

returns: invalid type [NUMBER(1,0)] for parameter 'path'

and

select b.* 
from "event_log" a
   , lateral flatten (input => a."events", path => [0]) b limit 100;

returns: Syntax error: unexpected '['. (line 162)

Ironically, when using recursive => True, the b.path field represent indexes like this [i].

The example in the SFlake docs makes an example of the use of the path parameter with an Object that stores Arrays, whereas here "events" is made up of an Array of Objects, so I actually do not have any working example for this type of Variants.

Upvotes: 1

Views: 1614

Answers (2)

Simeon Pilgrim
Simeon Pilgrim

Reputation: 25968

So you have an array in each row. The array has many objects with the same "structure". Getting the ID makes sense as there is one in every object.

So you could just access it.

select a."events"[0]:id::number as id
from "event_log" as a
limit 100;

gives:

A."EVENTS"[0]:ID
987418431597

But given each object's ID looks to be different how do you know you are getting to correct ID's it would seem to make more sense to use flatten to unroll the array. and access the object elements.

so in this "2 rows" of 4 fields:

with "event_log"("events") as (
    select parse_json(column1) from values
    ('[
        { "field_name": "status",  "id": 0987418431597, "previous_value": "new", "type": "Change", "value": "pending"},
        { "field_name": "360020024138", "id": 0987418431617, "previous_value": null, "type": "Change", "value": "#55927" },
        { "field_name": "360016698218", "id": 0987418431637, "previous_value": null, "type": "Change", "value": "0681102386"},
        { "field_name": "360016774537", "id":0987418431657, "previous_value": null, "type": "Change", "value": "89367031562011632212"}
        ]'),
    ('[
        { "field_name": "status",  "id": 1987418431597, "previous_value": "new", "type": "Change", "value": "pending"},
        { "field_name": "360020024138", "id":1987418431617, "previous_value": null, "type": "Change", "value": "#55927" },
        { "field_name": "360016698218", "id": 1987418431637, "previous_value": null, "type": "Change", "value": "0681102386"},
        { "field_name": "360016774537", "id":1987418431657, "previous_value": null, "type": "Change", "value": "89367031562011632212"}
        ]')
)
select b.seq as input_row
    ,b.index as array_index
    ,b.value:field_name::text as field_name
    ,b.value:id::number as id
    ,b.value:previous_value::text as previous_value
    ,b.value:type::text as type
    ,b.value:value::text as value
from "event_log" as a
    ,lateral flatten(input=>a."events") b
;

we get:

INPUT_ROW ARRAY_INDEX FIELD_NAME ID PREVIOUS_VALUE TYPE VALUE
1 0 status 987418431597 new Change pending
1 1 360020024138 987418431617 Change #55927
1 2 360016698218 987418431637 Change 0681102386
1 3 360016774537 987418431657 Change 89367031562011632212
2 0 status 1987418431597 new Change pending
2 1 360020024138 1987418431617 Change #55927
2 2 360016698218 1987418431637 Change 0681102386
2 3 360016774537 1987418431657 Change 89367031562011632212

Upvotes: 1

Lukasz Szozda
Lukasz Szozda

Reputation: 175756

The array index should be provided as input:

select b.* 
from "event_log" a,
lateral flatten (input => a."events"[1]) b
limit 100;

Sample:

CREATE OR REPLACE TABLE "event_log"("events" VARIANT)
AS
SELECT '[   {     "field_name": "status",     "id": 987418431597,     "previous_value": "new",     "type": "Change",     "value": "pending"   },   {     "field_name": "360020024138",     "id": 987418431617,     "previous_value": null,     "type": "Change",     "value": "#55927"   },   {     "field_name": "360016698218",     "id": 987418431637,     "previous_value": null,     "type": "Change",     "value": "0681102386"   },   {     "field_name": "360016774537",     "id": 987418431657,     "previous_value": null,     "type": "Change",     "value": "89367031562011632212"   } ]';

Output:

enter image description here

Upvotes: 1

Related Questions