user1265628
user1265628

Reputation: 5

How to query multiple JSON document schemas in Snowflake?

Could anyone tell me how to change the Stored Procedure in the article below to recursively expand all the attributes of a json file (multiple JSON document schemas)?

https://support.snowflake.net/s/article/Automating-Snowflake-Semi-Structured-JSON-Data-Handling-part-2

Upvotes: 0

Views: 388

Answers (1)

Greg Pavlik
Greg Pavlik

Reputation: 11086

Craig Warman's stored procedure posted in that blog is a great idea. I asked him if it was okay to refactor his code, and he agreed. I've used the refactored version in the field, so I know the SP well as well as how it works.

It may be possible to modify the SP to work on your JSON. It will depend on whether or not Snowflake types the JSON in your variant column. The way you have it structured, it may not type everything. You can check by running this SQL and seeing if the result set includes all the columns you need:

set VARIANT_TABLE = 'WEATHER';
set VARIANT_COLUMN = 'V';

with MAIN_TABLE as
(
select * from identifier($VARIANT_TABLE) sample (1000 rows)
)
select distinct REGEXP_REPLACE(REGEXP_REPLACE(f.path, '\\[(.+)\\]'),'[^a-zA-Z0-9]','_') AS path_name,       -- This generates paths with levels enclosed by double quotes (ex: "path"."to"."element").  It also strips any bracket-enclosed array element references (like "[0]")
                typeof(f.value)                                                         AS attribute_type,  -- This generates column datatypes.
                path_name                                                               AS alias_name       -- This generates column aliases based on the path
from
        MAIN_TABLE,
        LATERAL FLATTEN(identifier($VARIANT_COLUMN), RECURSIVE=>true) f
where   TYPEOF(f.value) != 'OBJECT'
        AND NOT contains(f.path, '[');

Be sure to replace the variables to your table and column names. If this picks up the type information for the columns in your JSON, then it's possible to modify this SP to do what you need. If it doesn't but there's a way to modify the query to get it to pick up the columns, that would work too.

If it doesn't pick up the columns, based on Craig's idea I decided to write type inference for non variant (such as strings from CSV log files without type information). Try the SQL above and see what results first.

Upvotes: 0

Related Questions