jyablonski
jyablonski

Reputation: 929

How to flatten out a json list of strings in Snowflake

Hi I'm working in Snowflake with some relational data in a table that also includes a JSON column with a VARIANT data type format. I'm able to manipulate the JSON data and get most of the columns I want, but I'm having trouble with 1 that is a list (or array?) of data.

Below is what the VARIANT column looks like in its raw JSON format. This column is called product_attributes and the key I'm having problems with is web_categories.

{
  "data": {
    "availability": "available",
    "product_company": "macys",
    "id": "22345897290",
    "price": 5.99,
    "web_categories": [
      [
        "Beauty",
        "Makeup",
        "Eye Makeup",
        "Brows"
      ]
    ]
  }
}

I basically want to like group by that id attribute and then return a distinct count of the number of web categories for each individual product. In this case I'd expect that to return 4.

select product_attributes:data:id::string as id, count(distinct(t1.value)) as number_of_categories
from mytable, table(flatten(product_attributes:data:web_categories)) t1
group by 1;

But this query doesnt work, the value column that is returned by the table flatten command looks like ["Beauty", "Makeup", "Eye Makeup", "Brows"] so when it runs the count operation it only returns 1. Do I have to like split the string up by (",") and go that route? It's difficult because this isn't a normal attribute with elements broken up by {} which is what most examples look like, it's just using [].

It looks like there are 2 methods that are used to break down JSON data but I've tried using both and can't seem to get what I want.

lateral flatten(input => product_attributes:data:web_categories) table(flatten(product_attributes:data:web_categories)) t1

If anyone can help me out I'd appreciate it!

Upvotes: 1

Views: 1364

Answers (1)

Nat Taylor
Nat Taylor

Reputation: 1108

Here is one approach. You could do this in 1 flatten if size(web_categories)==1 always by doing web_catgories[0] in the first flatten

with mytable as (select parse_json($1) product_attributes from values ('{
  "data": {
    "availability": "available",
    "product_company": "macys",
    "id": "22345897290",
    "price": 5.99,
    "web_categories": [
      [
        "Beauty",
        "Makeup",
        "Eye Makeup",
        "Brows"
      ]
    ]
  }}'))
  
select product_attributes:data:id::string as id, count(distinct(t1.value)) as number_of_categories
from mytable, lateral flatten(product_attributes:data:web_categories) t0, lateral flatten(t0.value) t1
group by 1;

ID  NUMBER_OF_CATEGORIES
22345897290 4

Upvotes: 2

Related Questions