Reputation: 355
The table below has customer ID, product purchased, and purchase date.
customer_id products purchase_date
93738117783 product a 5/24/2022
93738117783 product a 6/8/2022
93738117783 product a 7/19/2022
93738117783 product a 8/18/2022
93738117783 product a 9/22/2022
93738117783 product a 10/19/2022
93738117783 product a 11/17/2022
93738117783 product a 12/27/2022
93738554027 product a 5/5/2021
93738738408 product b 8/2/2021
93738738408 product b 9/20/2021
93738738408 product b 10/26/2021
93738738408 product b 12/2/2021
93738738408 product b 1/2/2022
93738738408 product b 3/27/2022
93738738408 product b 5/2/2022
93738738408 product b 6/10/2022
93738738408 product b 7/8/2022
93738738408 product b 7/31/2022
93738117783 product a 8/1/2022
93738117783 product a 9/5/2022
93738117783 product a 10/8/2022
93738117783 product a 11/16/2022
93738117783 product a 12/19/2022
93738943799 product a 10/21/2020
93738943799 product a 11/20/2020
93738943799 product a 1/24/2021
93739310547 product b 5/3/2022
93739310547 product b 8/19/2022
93739310547 product b 1/5/2023
From this table, I want to create a SQL query to get the following output -
product a new_customers week_ending_Friday
product a 2 12/16/2022
product b 3 12/16/2022
product a 1 12/10/2022
product b 4 12/10/2022
new_customers = new to purchasing the product in 1yr from the purchase_date week_ending_Friday = date rolled upto week ending friday
Any idea will help me.
Upvotes: 1
Views: 84
Reputation: 94939
First step: get a date's week's Friday. By ISO definition a week starts on Monday, so you'll have to add 4 days to the ISO week's start day to get Friday.
DATE_TRUNC('weekiso', purchase_date) + INTERVAL '4 day'
Correction: The general docs on date and time functions in snowflake show that weekiso
is not allowed with DATE_TRUNC
. This is a pity, for this means we must find out how many days to add to the purchase date. I think this should do the trick:
purchase_date + (interval '1 day' * (5 - date_part(dayofweekiso, purchase_date)))
Then, in order to find new customers, we would get the first purchse day per customer and product, a simple aggregation, but in your comments you say that you also consider a customer new, when their last puchase was made more than a year before. So, we need a lookup, which we do with [NOT] EXISTS
.
select
day as week_ending_friday,
product,
count_if(is_new) as new_customers
from
(
select
customer_id,
product,
purchase_date +
(interval '1 day' * (5 - date_part(dayofweekiso, purchase_date))) as day,
not exists
(
select null
from purchases pp
where pp.customer_id = p.customer_id
and pp.product = p.product
and pp.purchase_date < p.purchase_date
and pp.purchase_date >= p.purchase_date - interval '1 year'
) as is_new
from purchases p
) evaluated
group by day, product
order by day, product;
Upvotes: 2
Reputation: 1804
The qualify bit :
datediff(day,coalesce(lag(purchase_date)over(partition by customer_id,products order by purchase_date)+365,purchase_date),purchase_date)>=0
This identifies if the customer is new for this product. As the requirements define new to include over 1 year ago (+365) we need to grab the previous purchase (lag) and see how many days previous this was. I defaulted 0 if it's actually the very first purchase, meaning I can add +365 to the datediff and anything 0 and above is either brand new or over 1 year since the previous purchase.
I prefer the qualify approach over nested queries as the SQL looks more elegant, and I'm a lazy typist (ask Thorsten Kettner). The other answer by Thorsten uses almost 50% word/functions etc. However if you find the readability better -> that's the solution for you. So (1) find SQL you can read, (2) choose the understandable SQL with least chars.
The array functions array_agg() and array_size() are pretty cool - have a look at the Snowflake docs they're pretty self explanatory - reach back if you have any questions. I prefer to link to the docs as Snowflake is always fiddling with these functions so the docs can change. Note - I've never seen them break stuff -> but they sometimes add functionality.
The reason I add the cte at the front of the code - is so when smarter folks than I come and look at answers they can copy/paste into Snowflake and get the example working asap. This means we're not wasting their time - quite often actual Snowflake employees answer questions so it's best to make things quick and easy for them to improve answers. I'd recommend all questions have this -> as you're likely to get more answers.
with cte as (select 93738117783 customer_id, 'product a' products, '5/24/2022'::date purchase_date
union all select 93738117783 customer_id, 'product a' products, '6/8/2022'::date purchase_date
union all select 93738117783 customer_id, 'product a' products, '7/19/2022'::date purchase_date
union all select 93738117783 customer_id, 'product a' products, '8/18/2022'::date purchase_date
union all select 93738117783 customer_id, 'product a' products, '9/22/2022'::date purchase_date
union all select 93738117783 customer_id, 'product a' products, '10/19/2022'::date purchase_date
union all select 93738117783 customer_id, 'product a' products, '11/17/2022'::date purchase_date
union all select 93738117783 customer_id, 'product a' products, '12/27/2022'::date purchase_date
union all select 93738554027 customer_id, 'product a' products, '5/5/2021'::date purchase_date
union all select 93738738408 customer_id, 'product b' products, '8/2/2021'::date purchase_date
union all select 93738738408 customer_id, 'product b' products, '9/20/2021'::date purchase_date
union all select 93738738408 customer_id, 'product b' products, '10/26/2021'::date purchase_date
union all select 93738738408 customer_id, 'product b' products, '12/2/2021'::date purchase_date
union all select 93738738408 customer_id, 'product b' products, '1/2/2022'::date purchase_date
union all select 93738738408 customer_id, 'product b' products, '3/27/2022'::date purchase_date
union all select 93738738408 customer_id, 'product b' products, '5/2/2022'::date purchase_date
union all select 93738738408 customer_id, 'product b' products, '6/10/2022'::date purchase_date
union all select 93738738408 customer_id, 'product b' products, '7/8/2022'::date purchase_date
union all select 93738738408 customer_id, 'product b' products, '7/31/2022'::date purchase_date
union all select 93738117783 customer_id, 'product a' products, '8/1/2022'::date purchase_date
union all select 93738117783 customer_id, 'product a' products, '9/5/2022'::date purchase_date
union all select 93738117783 customer_id, 'product a' products, '10/8/2022'::date purchase_date
union all select 93738117783 customer_id, 'product a' products, '11/16/2022'::date purchase_date
union all select 93738117783 customer_id, 'product a' products, '12/19/2022'::date purchase_date
union all select 93738943799 customer_id, 'product a' products, '10/21/2020'::date purchase_date
union all select 93738943799 customer_id, 'product a' products, '11/20/2020'::date purchase_date
union all select 93738943799 customer_id, 'product a' products, '1/24/2021'::date purchase_date
union all select 93739310547 customer_id, 'product b' products, '5/3/2022'::date purchase_date
union all select 93739310547 customer_id, 'product b' products, '8/19/2020'::date purchase_date
union all select 93739310547 customer_id, 'product b' products, '1/5/2023'::date purchase_date)
select
DATE_TRUNC('week', purchase_date) + INTERVAL '4 day' week_number
,products
,array_unique_agg(customer_id)over (partition by week_number, products) new_custs
,array_size(new_custs) number_new_custs
from cte
qualify
datediff(day,coalesce(lag(purchase_date)over(partition by customer_id,products order by purchase_date)+365,purchase_date),purchase_date)>=0
Upvotes: 0