Nova700
Nova700

Reputation: 11

Countif or CASE with multiple conditions

I am trying to figure out the most efficient way to count products being placed in a online cart . I have ranked the first 3 items placed in a cart by purchase time(time they were put in the cart not actual check out time), but now am struggling to figure out a way to count the different combinations of items going into the cart.

Counting the individual ranks is easy enough, but I need to figure out a count for purchasing product 1 first and product 1 second as well as all the combinations possible (5 products total). I only need to count first items in the cart, all combinations of first item in cart to second item in cart, and all combinations of second item in cart to third third item in cart.

SELECT
COUNTIF(product = 'Product1' and rank = 1) as firstpurchase_product1,
COUNTIF((product = 'Product1' and rank = 1) and (product = 'Product1' and rank = 2)) as firstpurchase_product1_secondpurchase_product1,
COUNTIF((product = 'Product1' and rank = 1) and (product = 'Product2' and rank = 2)) as firstpurchase_product1_secondpurchase_product2,
#code would continue for all combinations.

FROM(

customer_info.customer_id as customer_id,
customer_info.session_id as session_id,
customer_info.product_purchased as product,
ROW_NUMBER() OVER (PARTITION BY customer_info.session_id ORDER BY customer_info.purchase_time ASC) AS rank
FROM customer_purchases cp,
WHERE p_date >= "2022-04-12" 
)rnk

where rnk.finish_rank in (1,2,3)

This seems like a lot of code, is there a better way to do it? The query is returning 0 for all line except when counting just first purchases, should I be using CASE instead?

Any thoughts or ideas would be appreciated.

Thanks!

Example of input:

Product 1, Product 2, Product 3

Product 1, Product 1, Product 1

Product 4, Product 2, Product 1

Product 3, Product 3, Product 5

Product 4, Product 2, Product 4

--this goes on for hundreds of lines

Output: Count Product 1 in first column

Count Product 2 in first column

#continue for all 5

Count of customers who put product 1 in cart first AND product 1 in cart second

Count of customers who put product 1 in cart first AND product 2 in cart second

###continue with all combinations with product 1

Count of customers who put product 2 in cart first and product 1 in cart second

Count of customers who put product 2 in the cart first and product 2 in the cart second

###continue with all combinations of product 2,3,4, and 5

Upvotes: 0

Views: 692

Answers (1)

Paddy Alton
Paddy Alton

Reputation: 2348

It seems to me that you want to GROUP BY a set of columns (item1, item2, item3) and produce a count of the number of times each combination occurs.

Possibly (it's a little unclear from your wording - a well-formatted table showing example raw data and desired results for that example would be helpful), you want to know an overall count for values of item1 regardless of the other items. This can be achieved via GROUP BY ROLLUP(item1, item2, item3).

So, our aim is to get an unaggregated table with those columns, so that we can aggregate it as described!

You have a long-format table (customer ID, session ID, product, rank) and we want a wide-format table with a column for each value of rank. This is a PIVOT operation:

WITH rnk AS (
  SELECT
     customer_id,
     session_id,
     product_purchased AS product,
     ROW_NUMBER() OVER (PARTITION BY session_id ORDER BY purchase_time ASC) AS rank
   FROM customer_info
  WHERE p_date >= "2022-04-12"
QUALIFY rank IN (1,2,3)
),
pivoted AS (
  SELECT * 
    FROM rnk PIVOT(
      ANY_VALUE(product) AS item FOR rank in (1,2,3)
    )
)

SELECT
    item_1,
    item_2,
    item_3,
    COUNT(*) AS N
  FROM
    pivoted
GROUP BY
    ROLLUP(item_1, item_2, item_3)

Does that get you what you want?


A couple of features to note:

  1. I use common table expressions (WITH) to make this more readable
  2. QUALIFY is a filter clause to apply to the output of a window clause
  3. Pivoting requires an aggregation function because in general there could be many records with the same value of session, product, and rank. Here we know there will be one record only, so it's safe to use ANY_VALUE (which 'aggregates' by non-deterministically choosing one of the values).
  4. Just to prevent confusion: ROLLUP will give you something like 'Product A', NULL, NULL for some of its records - this doesn't mean items 2 and 3 don't exist, it's just how it signals those records that group only by item 1 and aggregate over all values of the other items.

Upvotes: 1

Related Questions