Reputation: 929
Hi I'm working in Snowflake with some relational data in a table that also includes a JSON column with a VARIANT data type format. I'm able to manipulate the JSON data and get most of the columns I want, but I'm having trouble with 1 that is a list (or array?) of data.
Below is what the VARIANT column looks like in its raw JSON format. This column is called product_attributes
and the key I'm having problems with is web_categories
.
{
"data": {
"availability": "available",
"product_company": "macys",
"id": "22345897290",
"price": 5.99,
"web_categories": [
[
"Beauty",
"Makeup",
"Eye Makeup",
"Brows"
]
]
}
}
I basically want to like group by that id
attribute and then return a distinct count of the number of web categories for each individual product. In this case I'd expect that to return 4.
select product_attributes:data:id::string as id, count(distinct(t1.value)) as number_of_categories
from mytable, table(flatten(product_attributes:data:web_categories)) t1
group by 1;
But this query doesnt work, the value column that is returned by the table flatten command looks like ["Beauty", "Makeup", "Eye Makeup", "Brows"]
so when it runs the count operation it only returns 1. Do I have to like split the string up by (",") and go that route? It's difficult because this isn't a normal attribute with elements broken up by {} which is what most examples look like, it's just using [].
It looks like there are 2 methods that are used to break down JSON data but I've tried using both and can't seem to get what I want.
lateral flatten(input => product_attributes:data:web_categories)
table(flatten(product_attributes:data:web_categories)) t1
If anyone can help me out I'd appreciate it!
Upvotes: 1
Views: 1364
Reputation: 1108
Here is one approach. You could do this in 1 flatten if size(web_categories)==1
always by doing web_catgories[0] in the first flatten
with mytable as (select parse_json($1) product_attributes from values ('{
"data": {
"availability": "available",
"product_company": "macys",
"id": "22345897290",
"price": 5.99,
"web_categories": [
[
"Beauty",
"Makeup",
"Eye Makeup",
"Brows"
]
]
}}'))
select product_attributes:data:id::string as id, count(distinct(t1.value)) as number_of_categories
from mytable, lateral flatten(product_attributes:data:web_categories) t0, lateral flatten(t0.value) t1
group by 1;
ID NUMBER_OF_CATEGORIES
22345897290 4
Upvotes: 2