Reputation: 11
I am trying to figure out the most efficient way to count products being placed in a online cart . I have ranked the first 3 items placed in a cart by purchase time(time they were put in the cart not actual check out time), but now am struggling to figure out a way to count the different combinations of items going into the cart.
Counting the individual ranks is easy enough, but I need to figure out a count for purchasing product 1 first and product 1 second as well as all the combinations possible (5 products total). I only need to count first items in the cart, all combinations of first item in cart to second item in cart, and all combinations of second item in cart to third third item in cart.
SELECT
COUNTIF(product = 'Product1' and rank = 1) as firstpurchase_product1,
COUNTIF((product = 'Product1' and rank = 1) and (product = 'Product1' and rank = 2)) as firstpurchase_product1_secondpurchase_product1,
COUNTIF((product = 'Product1' and rank = 1) and (product = 'Product2' and rank = 2)) as firstpurchase_product1_secondpurchase_product2,
#code would continue for all combinations.
FROM(
customer_info.customer_id as customer_id,
customer_info.session_id as session_id,
customer_info.product_purchased as product,
ROW_NUMBER() OVER (PARTITION BY customer_info.session_id ORDER BY customer_info.purchase_time ASC) AS rank
FROM customer_purchases cp,
WHERE p_date >= "2022-04-12"
)rnk
where rnk.finish_rank in (1,2,3)
This seems like a lot of code, is there a better way to do it? The query is returning 0 for all line except when counting just first purchases, should I be using CASE instead?
Any thoughts or ideas would be appreciated.
Thanks!
Example of input:
Product 1, Product 2, Product 3
Product 1, Product 1, Product 1
Product 4, Product 2, Product 1
Product 3, Product 3, Product 5
Product 4, Product 2, Product 4
--this goes on for hundreds of lines
Output: Count Product 1 in first column
Count Product 2 in first column
#continue for all 5
Count of customers who put product 1 in cart first AND product 1 in cart second
Count of customers who put product 1 in cart first AND product 2 in cart second
###continue with all combinations with product 1
Count of customers who put product 2 in cart first and product 1 in cart second
Count of customers who put product 2 in the cart first and product 2 in the cart second
###continue with all combinations of product 2,3,4, and 5
Upvotes: 0
Views: 692
Reputation: 2348
It seems to me that you want to GROUP BY
a set of columns (item1
, item2
, item3
) and produce a count of the number of times each combination occurs.
Possibly (it's a little unclear from your wording - a well-formatted table showing example raw data and desired results for that example would be helpful), you want to know an overall count for values of item1
regardless of the other items. This can be achieved via GROUP BY ROLLUP(item1, item2, item3)
.
So, our aim is to get an unaggregated table with those columns, so that we can aggregate it as described!
You have a long-format table (customer ID, session ID, product, rank) and we want a wide-format table with a column for each value of rank. This is a PIVOT
operation:
WITH rnk AS (
SELECT
customer_id,
session_id,
product_purchased AS product,
ROW_NUMBER() OVER (PARTITION BY session_id ORDER BY purchase_time ASC) AS rank
FROM customer_info
WHERE p_date >= "2022-04-12"
QUALIFY rank IN (1,2,3)
),
pivoted AS (
SELECT *
FROM rnk PIVOT(
ANY_VALUE(product) AS item FOR rank in (1,2,3)
)
)
SELECT
item_1,
item_2,
item_3,
COUNT(*) AS N
FROM
pivoted
GROUP BY
ROLLUP(item_1, item_2, item_3)
Does that get you what you want?
A couple of features to note:
QUALIFY
is a filter clause to apply to the output of a window clauseANY_VALUE
(which 'aggregates' by non-deterministically choosing one of the values).ROLLUP
will give you something like 'Product A', NULL, NULL for some of its records - this doesn't mean items 2 and 3 don't exist, it's just how it signals those records that group only by item 1 and aggregate over all values of the other items.Upvotes: 1