Snowflake - produce nested JSON ouput

Question

I've been struggling with this for a long time. I have the following table...

postcode	households	wight cable	wight cable fastest down	B4RN	B4RN fastest down
X24 888	34	1	108.2	0	0
BT36 7JU	17	0	0	1	274.23

What I want to do is ouput JSON like the following (one row here)

{'postcode':"X24 888",
 'households':34,
 'providers':[{'name':"wight cable",
               'fastest_down':108.2,
               'present':1},
               {'name':"B4RN",
               'fastest_down':0,
               'present':0}']
}

There's actually about 50 columns like this. I can get the list using a command like SHOW COLUMNS LIKE '%fastest down%' IN TABLE TABLE1, but I'm struggling with looping through the table to catch the data from these, and also the particular nested json structure.

Here is what I've got so far. Perhaps it'll only take a few small edits at this point to get to what I need.


create or replace function custom_object_assign(o1 VARIANT, o2 VARIANT) 
returns VARIANT 
language javascript 
as 'return Object.assign(O1, O2);';


with t1 AS
(
SELECT OBJECT_CONSTRUCT(
    '_id', h."ID",
    'providers', array_agg(object_construct(
                        'wight cable', h."wight cable",
                        'wight cable fastest down', h."wight cable fastest down")
)) AS dc
FROM TABLE1 h
group by "ID"
),
t2 AS
(SELECT OBJECT_CONSTRUCT(
    '_id', h."ID",
    'rmpostcode', "rmpostcode"
) AS rs
FROM TABLE1 h
)
SELECT custom_object_assign(dc, rs)
FROM t1 
JOIN t2 
ON rs:"_id" = dc:"_id"
LIMIT 10;

which is returning something such as

{ "_id": "786516", "providers": [ { "wight cable": 0 } ], "rmpostcode": "LL65 1SJ" }

Not quite what I needed yet!

Greg Pavlik · Accepted Answer

Here's a sample using a stored procedure to read the source table and add rows to a target table. You can see in the code that there are constants to define the source and target tables.

Limitation: Right now the code expects the postcode, households, and subsequent "property" + "property fastest down" columns to be in well-ordered ordinal position. If that will not be the case, then there needs to be a loop to read the column names to find where these columns are in ordinal positions.

Usage Note: The JavaScript is constructing a custom JSON. Since Snowflake doesn't like single-row inserts, it's constructing a JSON array and flattening out the array to insert 1000 rows at a time. There's a constant that sets the row buffer. If the JSON gets over 16MB it will fail, so that may need to be adjusted downward if that happens.

Recommendation: If you want to productionize this, you may want to create a stream table off your source table. It will make it much easier to process only the new rows using this SP. https://snowflake.pavlik.us/index.php/2020/01/12/snowflake-streams-made-simple. The code would need to be modified to ignore the final three metadata columns and then run a DML to nowhere (insert into target_table from stream_table where 1 = 0) to advance the stream.

Here's the SP code to fill the target table to get started.

create or replace table FLAT_VALUES
    (
     "postcode"                 string
    ,"households"               int
    ,"wight cable"              int
    ,"wight cable fastest down" float
    ,"B4RN"                     int
    ,"B4RN fastest down"        float
    )
;

insert into FLAT_VALUES select 'X24 888', 34, 1, 108.2, 0, 0;
insert into FLAT_VALUES select 'BT36 7JU', 17, 0, 0, 1, 274.23;

create or replace table NESTED_JSON(v variant);

create or replace procedure GET_NESTED_JSON()
returns variant
language javascript
as
$$

var out = {};

const INPUT_TABLE        = "FLAT_VALUES";
const OUTPUT_TABLE       = "NESTED_JSON";
const INSERT_BUFFER_ROWS = 1000;

class Query{
    constructor(statement){
        this.statement = statement;
    }
}

var selectSQL = `select * from ${INPUT_TABLE}`;

var json = [];
var row = {};
var colArr = [];
var subRow = {};
var rowBuffer = 0;
var rowsInserted = 0;

var selectQuery = getQuery(selectSQL);

while (selectQuery.resultSet.next()) {
    row = {};
    rowBuffer++;
    row[selectQuery.statement.getColumnName(1)] = selectQuery.resultSet.getColumnValue(1);
    row[selectQuery.statement.getColumnName(2)] = selectQuery.resultSet.getColumnValue(2);
    colArr = [];
    for(col = 3; col <= selectQuery.statement.getColumnCount(); col = col + 2) {
        subRow = {};
        subRow["name"]          = selectQuery.statement.getColumnName(col);
        subRow["fastest_down"]  = selectQuery.resultSet.getColumnValue(selectQuery.statement.getColumnName(col) + " fastest down");
        subRow["present"]       = selectQuery.resultSet.getColumnValue(selectQuery.statement.getColumnName(col));
        colArr.push(subRow);
    }
    row["providers"] = colArr;
    json.push(row);
    if (rowBuffer == INSERT_BUFFER_ROWS) {
        rowBuffer = 0;
        insertRows(OUTPUT_TABLE, json);
        rowsInserted += rowBuffer;
    }
}
if (rowBuffer > 0) {
    insertRows(OUTPUT_TABLE, json);
    rowsInserted += rowBuffer;
}

out["ROWS_INSERTED"] = rowsInserted;

return out;

function insertRows(targetTable, json) {
    var jsonString = JSON.stringify(json);
    var insertSQL = `insert into NESTED_JSON select VALUE from table(flatten(parse_json('${jsonString}')))`;
    var insertQuery = getQuery(insertSQL);
}

function getQuery(sql){
    var cmd = {sqlText: sql};
    var query = new Query(snowflake.createStatement(cmd));
    try {
        query.resultSet = query.statement.execute();
    } catch (e) {
        query.error = e.message;
    }
    return query;
}

$$;

call get_nested_json();

select * from NESTED_JSON;

Snowflake - produce nested JSON ouput

Answers (1)

Related Questions