Reputation: 755
I have a Key-value based records in the Snowflake table, where for a given product_id there are dozens of Key-value pairs records. See the example below:
with t1 (product_id, key, value) as
(
select 101, 'grade', 'high' union all
select 101, 'expense_cost', 'high' union all
select 101, 'maintenance_cost', 'medium' union all
select 102, 'grade', 'medium' union all
select 102, 'expense_cost', 'high' union all
select 103, 'expense_cost', 'high' union all
select 103, 'maintenance_cost', 'medium'
)
select * from t1;
Given this data model, the requirement is to fetch product_ids that match the filter criteria of key values.
Example 1: Fetch all product_id where key=(grade) is having value=(high or medium) & key=(expense_cost) has value=(high)
Example 2: Fetch all product_id where key=(grade) has value=(high) & key=(maintenance_cost) has value=(high or medium)
I am able to solve this requirement using a Snowflake PIVOT
function, which first converts Key-Value data structure to Columnar data structure and then applies a Filter
condition using the WHERE
clause. Is there a better way to solve this problem without using PIVOTs, for example by using some Window functionality, etc.?
My PIVOT based solution for Example 1:
with t1 (product_id, key, value) as
(
select 101, 'grade', 'high' union all
select 101, 'expense_cost', 'high' union all
select 101, 'maintenance_cost', 'medium' union all
select 102, 'grade', 'medium' union all
select 102, 'expense_cost', 'high' union all
select 103, 'expense_cost', 'high' union all
select 103, 'maintenance_cost', 'medium'
)
select * from (
select product_id, key, value
from t1
where key in ('grade','expense_cost','maintenance_cost')
) pivot(min(value) for key in ('grade','expense_cost','maintenance_cost'))
as p (product_id, grade, expense_cost, maintenance_cost)
where grade in ('high','medium')
and expense_cost in ('high');
NOTE: IN case of using Window function, the output must only the contain the rows which qualify or pass the filter condition. The output must not contain all the records in the qualified partition.
Upvotes: 0
Views: 7312
Reputation: 755
Adding my solution below which is an improvement on @Lukasz Szozda's solution which had the following 2 problems:
a. For each Attribute in the Filter criteria his code needed to add an additional Window function that increased the Snowflake computation time linearly.
b. There were dozens of Attributes that needed to be removed from the resultset and adding so many Window functions for each of those Attributes would make the SQL query very bloated.
Here is my improved code:
with t1 (product_id, key, value) as
(
select 101, 'grade', 'high' union all
select 101, 'expense_cost', 'high' union all
select 101, 'maintenance_cost', 'medium' union all
select 102, 'grade', 'medium' union all
select 102, 'expense_cost', 'high' union all
select 103, 'expense_cost', 'high' union all
select 103, 'maintenance_cost', 'medium'
)
select * ,
(CASE
WHEN key = 'grade' AND value IN ('medium', 'high') THEN TRUE
WHEN key = 'expense_cost' AND value = 'high' THEN TRUE
ELSE FALSE
END) AS is_allowed
from t1
WHERE key IN ('grade', 'expense_cost')
QUALIFY COUNT_IF(is_allowed=TRUE) OVER(PARTITION BY product_id) = 2;
Upvotes: 0
Reputation: 175716
The pattern to use is QUALIFY
combined with COUNT_IF
. It could be pivoted if needed afterwards:
"Requirement 1: Fetch all product_id where grade is (high or medium) & expense_cost = high"
with t1 (product_id, key, value) as
(
select 101, 'grade', 'high' union all
select 101, 'expense_cost', 'high' union all
select 101, 'maintenance_cost', 'medium' union all
select 102, 'grade', 'medium' union all
select 102, 'expense_cost', 'high' union all
select 103, 'expense_cost', 'high' union all
select 103, 'maintenance_cost', 'medium'
)
select *
from t1
qualify COUNT_IF(key='grade' AND value IN ('medium', 'high'))
OVER(PARTITION BY product_id) > 0
AND COUNT_IF(key = 'expense_cost' AND value = 'high')
OVER(PARTITION BY product_id) > 0;
Output:
Requirement 2: Fetch all product_id where key=(grade) has value=(high) & key=(maintenance_cost) has value=(high or medium)
with t1 (product_id, key, value) as
(
select 101, 'grade', 'high' union all
select 101, 'expense_cost', 'high' union all
select 101, 'maintenance_cost', 'medium' union all
select 102, 'grade', 'medium' union all
select 102, 'expense_cost', 'high' union all
select 103, 'expense_cost', 'high' union all
select 103, 'maintenance_cost', 'medium'
)
select *
from t1
qualify COUNT_IF(key='grade' AND value IN ('high'))
OVER(PARTITION BY product_id) > 0
AND COUNT_IF(key = 'maintenance_cost' AND value IN ('medium','high'))
OVER(PARTITION BY product_id) > 0
AND COUNT_IF(key='expense_cost' AND value IN ('high'))
OVER(PARTITION BY product_id) = 0 -- explicitly excluding
Output:
Upvotes: 3
Reputation: 5803
I think it's easier and more self-documenting if you sub-select qualifying product_ids using conditional-aggregation, then filter on qualifying product_ids plus the conditions in the where clause. Below is a demo for requirement #2, but you can easily modify it for #1.
with t (product_id, keys, value) as
(select 101, 'grade','high' union all
select 101, 'expense_cost','high' union all
select 101, 'maintenance_cost','medium' union all
select 102, 'grade','medium' union all
select 102, 'expense_cost','high' union all
select 103, 'expense_cost','high' union all
select 103, 'maintenance_cost','medium' ),
product_ids as
(select product_id
from t
group by product_id
having sum(case when keys='grade' and value ='high' then 1 end)>0 and
sum(case when keys='maintenance_cost' and value in ('high', 'medium') then 1 end)>0)
select *
from t
where product_id in (select product_id from product_ids) and
((keys='grade' and value ='high') or (keys='maintenance_cost' and value in ('high', 'medium')))
Upvotes: 0