Santoo
Santoo

Reputation: 355

Distinct New Customers in SQL

The table below has customer ID, product purchased, and purchase date.

customer_id products    purchase_date
93738117783 product a   5/24/2022
93738117783 product a   6/8/2022
93738117783 product a   7/19/2022
93738117783 product a   8/18/2022
93738117783 product a   9/22/2022
93738117783 product a   10/19/2022
93738117783 product a   11/17/2022
93738117783 product a   12/27/2022
93738554027 product a   5/5/2021
93738738408 product b   8/2/2021
93738738408 product b   9/20/2021
93738738408 product b   10/26/2021
93738738408 product b   12/2/2021
93738738408 product b   1/2/2022
93738738408 product b   3/27/2022
93738738408 product b   5/2/2022
93738738408 product b   6/10/2022
93738738408 product b   7/8/2022
93738738408 product b   7/31/2022
93738117783 product a   8/1/2022
93738117783 product a   9/5/2022
93738117783 product a   10/8/2022
93738117783 product a   11/16/2022
93738117783 product a   12/19/2022
93738943799 product a   10/21/2020
93738943799 product a   11/20/2020
93738943799 product a   1/24/2021
93739310547 product b   5/3/2022
93739310547 product b   8/19/2022
93739310547 product b   1/5/2023

From this table, I want to create a SQL query to get the following output -

product a   new_customers   week_ending_Friday
product a   2               12/16/2022
product b   3               12/16/2022
product a   1               12/10/2022
product b   4               12/10/2022

new_customers = new to purchasing the product in 1yr from the purchase_date week_ending_Friday = date rolled upto week ending friday

Any idea will help me.

Upvotes: 1

Views: 84

Answers (2)

Thorsten Kettner
Thorsten Kettner

Reputation: 94939

First step: get a date's week's Friday. By ISO definition a week starts on Monday, so you'll have to add 4 days to the ISO week's start day to get Friday.

DATE_TRUNC('weekiso', purchase_date) + INTERVAL '4 day'

Correction: The general docs on date and time functions in snowflake show that weekiso is not allowed with DATE_TRUNC. This is a pity, for this means we must find out how many days to add to the purchase date. I think this should do the trick:

purchase_date + (interval '1 day' * (5 - date_part(dayofweekiso, purchase_date)))

Then, in order to find new customers, we would get the first purchse day per customer and product, a simple aggregation, but in your comments you say that you also consider a customer new, when their last puchase was made more than a year before. So, we need a lookup, which we do with [NOT] EXISTS.

select
  day as week_ending_friday,
  product,
  count_if(is_new) as new_customers
from
(
  select
    customer_id,
    product,
    purchase_date +
      (interval '1 day' * (5 - date_part(dayofweekiso, purchase_date))) as day,
    not exists
    (
      select null
      from purchases pp
      where pp.customer_id = p.customer_id
      and pp.product = p.product
      and pp.purchase_date < p.purchase_date
      and pp.purchase_date >= p.purchase_date - interval '1 year'
    ) as is_new
  from purchases p
) evaluated
group by day, product
order by day, product;

Upvotes: 2

Adrian White
Adrian White

Reputation: 1804

enter image description here

The qualify bit :

datediff(day,coalesce(lag(purchase_date)over(partition by customer_id,products order by purchase_date)+365,purchase_date),purchase_date)>=0

This identifies if the customer is new for this product. As the requirements define new to include over 1 year ago (+365) we need to grab the previous purchase (lag) and see how many days previous this was. I defaulted 0 if it's actually the very first purchase, meaning I can add +365 to the datediff and anything 0 and above is either brand new or over 1 year since the previous purchase.

I prefer the qualify approach over nested queries as the SQL looks more elegant, and I'm a lazy typist (ask Thorsten Kettner). The other answer by Thorsten uses almost 50% word/functions etc. However if you find the readability better -> that's the solution for you. So (1) find SQL you can read, (2) choose the understandable SQL with least chars.

The array functions array_agg() and array_size() are pretty cool - have a look at the Snowflake docs they're pretty self explanatory - reach back if you have any questions. I prefer to link to the docs as Snowflake is always fiddling with these functions so the docs can change. Note - I've never seen them break stuff -> but they sometimes add functionality.

The reason I add the cte at the front of the code - is so when smarter folks than I come and look at answers they can copy/paste into Snowflake and get the example working asap. This means we're not wasting their time - quite often actual Snowflake employees answer questions so it's best to make things quick and easy for them to improve answers. I'd recommend all questions have this -> as you're likely to get more answers.

with cte as (select 93738117783 customer_id, 'product a' products,   '5/24/2022'::date purchase_date
union all select 93738117783 customer_id, 'product a' products,   '6/8/2022'::date purchase_date
union all select 93738117783 customer_id, 'product a' products,   '7/19/2022'::date purchase_date
union all select 93738117783 customer_id, 'product a' products,   '8/18/2022'::date purchase_date
union all select 93738117783 customer_id, 'product a' products,   '9/22/2022'::date purchase_date
union all select 93738117783 customer_id, 'product a' products,   '10/19/2022'::date purchase_date
union all select 93738117783 customer_id, 'product a' products,   '11/17/2022'::date purchase_date
union all select 93738117783 customer_id, 'product a' products,   '12/27/2022'::date purchase_date
union all select 93738554027 customer_id, 'product a' products,   '5/5/2021'::date purchase_date
union all select 93738738408 customer_id, 'product b' products,   '8/2/2021'::date purchase_date
union all select 93738738408 customer_id, 'product b' products,   '9/20/2021'::date purchase_date
union all select 93738738408 customer_id, 'product b' products,   '10/26/2021'::date purchase_date
union all select 93738738408 customer_id, 'product b' products,   '12/2/2021'::date purchase_date
union all select 93738738408 customer_id, 'product b' products,   '1/2/2022'::date purchase_date
union all select 93738738408 customer_id, 'product b' products,   '3/27/2022'::date purchase_date
union all select 93738738408 customer_id, 'product b' products,   '5/2/2022'::date purchase_date
union all select 93738738408 customer_id, 'product b' products,   '6/10/2022'::date purchase_date
union all select 93738738408 customer_id, 'product b' products,   '7/8/2022'::date purchase_date
union all select 93738738408 customer_id, 'product b' products,   '7/31/2022'::date purchase_date
union all select 93738117783 customer_id, 'product a' products,   '8/1/2022'::date purchase_date
union all select 93738117783 customer_id, 'product a' products,   '9/5/2022'::date purchase_date
union all select 93738117783 customer_id, 'product a' products,   '10/8/2022'::date purchase_date
union all select 93738117783 customer_id, 'product a' products,   '11/16/2022'::date purchase_date
union all select 93738117783 customer_id, 'product a' products,   '12/19/2022'::date purchase_date
union all select 93738943799 customer_id, 'product a' products,   '10/21/2020'::date purchase_date
union all select 93738943799 customer_id, 'product a' products,   '11/20/2020'::date purchase_date
union all select 93738943799 customer_id, 'product a' products,   '1/24/2021'::date purchase_date
union all select 93739310547 customer_id, 'product b' products,   '5/3/2022'::date purchase_date
union all select 93739310547 customer_id, 'product b' products,   '8/19/2020'::date purchase_date
union all select 93739310547 customer_id, 'product b' products,   '1/5/2023'::date purchase_date)

 select   
  DATE_TRUNC('week', purchase_date) + INTERVAL '4 day' week_number
 ,products
 ,array_unique_agg(customer_id)over (partition by week_number, products) new_custs
 ,array_size(new_custs) number_new_custs
from cte 
qualify 
    datediff(day,coalesce(lag(purchase_date)over(partition by customer_id,products order by purchase_date)+365,purchase_date),purchase_date)>=0 
   

Upvotes: 0

Related Questions