sunny_dev
sunny_dev

Reputation: 755

Snowflake query a partition with filter condition

I have a Key-value based records in the Snowflake table, where for a given product_id there are dozens of Key-value pairs records. See the example below:

enter image description here

with t1 (product_id, key, value) as 
    (
    select 101, 'grade',                'high' union all
    select 101, 'expense_cost',     'high' union all
    select 101, 'maintenance_cost', 'medium' union all
    select 102, 'grade',                'medium' union all
    select 102, 'expense_cost',     'high' union all
    select 103, 'expense_cost',     'high' union all
    select 103, 'maintenance_cost', 'medium' 
    )
    select * from t1;

Given this data model, the requirement is to fetch product_ids that match the filter criteria of key values.

Example 1: Fetch all product_id where key=(grade) is having value=(high or medium) & key=(expense_cost) has value=(high)

Example 2: Fetch all product_id where key=(grade) has value=(high) & key=(maintenance_cost) has value=(high or medium)

I am able to solve this requirement using a Snowflake PIVOT function, which first converts Key-Value data structure to Columnar data structure and then applies a Filter condition using the WHERE clause. Is there a better way to solve this problem without using PIVOTs, for example by using some Window functionality, etc.?

My PIVOT based solution for Example 1:

with t1 (product_id, key, value) as 
(
select 101, 'grade',                'high' union all
select 101, 'expense_cost',     'high' union all
select 101, 'maintenance_cost', 'medium' union all
select 102, 'grade',                'medium' union all
select 102, 'expense_cost',     'high' union all
select 103, 'expense_cost',     'high' union all
select 103, 'maintenance_cost', 'medium' 
)
select * from (
select product_id, key, value 
  from t1 
  where key in ('grade','expense_cost','maintenance_cost')
) pivot(min(value) for key in ('grade','expense_cost','maintenance_cost'))
as p (product_id, grade, expense_cost, maintenance_cost)
where grade in ('high','medium')
and expense_cost in ('high');

NOTE: IN case of using Window function, the output must only the contain the rows which qualify or pass the filter condition. The output must not contain all the records in the qualified partition.

Upvotes: 0

Views: 7312

Answers (3)

sunny_dev
sunny_dev

Reputation: 755

Adding my solution below which is an improvement on @Lukasz Szozda's solution which had the following 2 problems:

a. For each Attribute in the Filter criteria his code needed to add an additional Window function that increased the Snowflake computation time linearly.

b. There were dozens of Attributes that needed to be removed from the resultset and adding so many Window functions for each of those Attributes would make the SQL query very bloated.

Here is my improved code:

with t1 (product_id, key, value) as 
(
select 101, 'grade',                'high' union all
select 101, 'expense_cost',     'high' union all
select 101, 'maintenance_cost', 'medium' union all
select 102, 'grade',                'medium' union all
select 102, 'expense_cost',     'high' union all
select 103, 'expense_cost',     'high' union all
select 103, 'maintenance_cost', 'medium' 
)
select * , 
 (CASE 
      WHEN key = 'grade' AND value IN ('medium', 'high') THEN TRUE
      WHEN key = 'expense_cost' AND value = 'high' THEN TRUE
      ELSE FALSE
  END) AS is_allowed
from t1
WHERE key IN ('grade', 'expense_cost')
QUALIFY COUNT_IF(is_allowed=TRUE) OVER(PARTITION BY product_id) = 2;

Upvotes: 0

Lukasz Szozda
Lukasz Szozda

Reputation: 175716

The pattern to use is QUALIFY combined with COUNT_IF. It could be pivoted if needed afterwards:

"Requirement 1: Fetch all product_id where grade is (high or medium) & expense_cost = high"

with t1 (product_id, key, value) as 
    (
    select 101, 'grade',                'high' union all
    select 101, 'expense_cost',     'high' union all
    select 101, 'maintenance_cost', 'medium' union all
    select 102, 'grade',                'medium' union all
    select 102, 'expense_cost',     'high' union all
    select 103, 'expense_cost',     'high' union all
    select 103, 'maintenance_cost', 'medium' 
    )
select * 
from t1
qualify COUNT_IF(key='grade' AND value IN ('medium', 'high')) 
            OVER(PARTITION BY product_id) > 0
    AND COUNT_IF(key = 'expense_cost' AND value = 'high') 
            OVER(PARTITION BY product_id) > 0;

Output:

enter image description here


Requirement 2: Fetch all product_id where key=(grade) has value=(high) & key=(maintenance_cost) has value=(high or medium)

with t1 (product_id, key, value) as 
    (
    select 101, 'grade',                'high' union all
    select 101, 'expense_cost',     'high' union all
    select 101, 'maintenance_cost', 'medium' union all
    select 102, 'grade',                'medium' union all
    select 102, 'expense_cost',     'high' union all
    select 103, 'expense_cost',     'high' union all
    select 103, 'maintenance_cost', 'medium' 
    )
select * 
from t1
qualify COUNT_IF(key='grade' AND value IN ('high')) 
            OVER(PARTITION BY product_id) > 0
    AND COUNT_IF(key = 'maintenance_cost' AND value IN ('medium','high'))
            OVER(PARTITION BY product_id) > 0
    AND COUNT_IF(key='expense_cost' AND value IN ('high')) 
           OVER(PARTITION BY product_id) = 0  -- explicitly excluding

Output:

enter image description here

Upvotes: 3

Rajat
Rajat

Reputation: 5803

I think it's easier and more self-documenting if you sub-select qualifying product_ids using conditional-aggregation, then filter on qualifying product_ids plus the conditions in the where clause. Below is a demo for requirement #2, but you can easily modify it for #1.

with t (product_id, keys, value) as 

(select 101, 'grade','high' union all
 select 101, 'expense_cost','high' union all
 select 101, 'maintenance_cost','medium' union all
 select 102, 'grade','medium' union all
 select 102, 'expense_cost','high' union all
 select 103, 'expense_cost','high' union all
 select 103, 'maintenance_cost','medium' ),

product_ids as

(select product_id
 from t
 group by product_id
 having sum(case when keys='grade' and value ='high' then 1 end)>0 and
        sum(case when keys='maintenance_cost' and value in ('high', 'medium') then 1 end)>0)
                            
select * 
from t
where product_id in (select product_id from product_ids) and
    ((keys='grade' and value ='high') or (keys='maintenance_cost' and value in ('high', 'medium')))
        

Upvotes: 0

Related Questions