Reputation: 52
I have the following table in a Snowflake data warehouse:
Client_ID | Appointment_Date | Store_ID |
---|---|---|
Client_1 | 1/1/2021 | Store_1 |
Client_2 | 1/1/2021 | Store_1 |
Client_1 | 2/1/2021 | Store_2 |
Client_2 | 2/1/2021 | Store_1 |
Client_1 | 3/1/2021 | Store_1 |
Client_2 | 3/1/2021 | Store_1 |
I need to be able to count the number of unique Store_ID
for each Client_ID
in order of Appointment_Date
. Something like following is my desired output:
Customer_ID | Appointment_Date | Store_ID | Count_Different_Stores |
---|---|---|---|
Client_1 | 1/1/2021 | Store_1 | 1 |
Client_2 | 1/1/2021 | Store_1 | 1 |
Client_1 | 2/1/2021 | Store_2 | 2 |
Client_2 | 2/1/2021 | Store_1 | 1 |
Client_1 | 3/1/2021 | Store_1 | 2 |
Client_2 | 3/1/2021 | Store_1 | 1 |
Where I would be actively counting the number of distinct stores a client visits over time. I've tried:
SELECT Client_ID, Appointment_Date, Store_ID,
DENSE_RANK() OVER (PARTITION BY CLIENT_ID, STORE_ID ORDER BY APPOINTMENT_DATE)
FROM table
Which yields:
Customer_ID | Appointment_Date | Store_ID | Count_Different_Stores |
---|---|---|---|
Client_1 | 1/1/2021 | Store_1 | 1 |
Client_2 | 1/1/2021 | Store_1 | 1 |
Client_1 | 2/1/2021 | Store_2 | 2 |
Client_2 | 2/1/2021 | Store_1 | 2 |
Client_1 | 3/1/2021 | Store_1 | 3 |
Client_2 | 3/1/2021 | Store_1 | 3 |
And:
SELECT Client_ID, Store_ID,
DENSE_RANK() OVER (PARTITION BY CLIENT_ID, STORE_ID)
FROM table
--With a join back to the original table with all my needed data
Which yields:
Customer_ID | Appointment_Date | Store_ID | Count_Different_Stores |
---|---|---|---|
Client_1 | 1/1/2021 | Store_1 | 2 |
Client_2 | 1/1/2021 | Store_1 | 1 |
Client_1 | 2/1/2021 | Store_2 | 1 |
Client_2 | 2/1/2021 | Store_1 | 1 |
Client_1 | 3/1/2021 | Store_1 | 1 |
Client_2 | 3/1/2021 | Store_1 | 1 |
The second one is closer to what I need, but the ranking of the distinct stores doesn't necessarily account for the order of Appointment_Date
, which is critical. Sometimes the order will be correct, and sometimes not.
Any insight is helpful, happy to provide more information.
Upvotes: 0
Views: 74
Reputation: 1269973
If I understand correctly, you want a cumulative count(distinct)
as a window function. Snowflake does not support that directly, but you can easily calculate it using row_number()
and a cumulative sum:
select t.*,
sum( (seqnum = 1)::int) over (partition by client_id order by appointment_date) as num_distinct_stores
from (select t.*,
row_number() over (partition by client_id, store_id order by appointment_date) as seqnum
from t
) t;
Upvotes: 1