VBAmazing
VBAmazing

Reputation: 52

SQL: How can I count unique instances grouped by client ordered by date?

I have the following table in a Snowflake data warehouse:

Client_ID Appointment_Date Store_ID
Client_1 1/1/2021 Store_1
Client_2 1/1/2021 Store_1
Client_1 2/1/2021 Store_2
Client_2 2/1/2021 Store_1
Client_1 3/1/2021 Store_1
Client_2 3/1/2021 Store_1

I need to be able to count the number of unique Store_ID for each Client_ID in order of Appointment_Date. Something like following is my desired output:

Customer_ID Appointment_Date Store_ID Count_Different_Stores
Client_1 1/1/2021 Store_1 1
Client_2 1/1/2021 Store_1 1
Client_1 2/1/2021 Store_2 2
Client_2 2/1/2021 Store_1 1
Client_1 3/1/2021 Store_1 2
Client_2 3/1/2021 Store_1 1

Where I would be actively counting the number of distinct stores a client visits over time. I've tried:

SELECT Client_ID, Appointment_Date, Store_ID,
DENSE_RANK() OVER (PARTITION BY CLIENT_ID, STORE_ID ORDER BY APPOINTMENT_DATE)
FROM table

Which yields:

Customer_ID Appointment_Date Store_ID Count_Different_Stores
Client_1 1/1/2021 Store_1 1
Client_2 1/1/2021 Store_1 1
Client_1 2/1/2021 Store_2 2
Client_2 2/1/2021 Store_1 2
Client_1 3/1/2021 Store_1 3
Client_2 3/1/2021 Store_1 3

And:

SELECT Client_ID, Store_ID,
DENSE_RANK() OVER (PARTITION BY CLIENT_ID, STORE_ID)
FROM table
--With a join back to the original table with all my needed data

Which yields:

Customer_ID Appointment_Date Store_ID Count_Different_Stores
Client_1 1/1/2021 Store_1 2
Client_2 1/1/2021 Store_1 1
Client_1 2/1/2021 Store_2 1
Client_2 2/1/2021 Store_1 1
Client_1 3/1/2021 Store_1 1
Client_2 3/1/2021 Store_1 1

The second one is closer to what I need, but the ranking of the distinct stores doesn't necessarily account for the order of Appointment_Date, which is critical. Sometimes the order will be correct, and sometimes not.

Any insight is helpful, happy to provide more information.

Upvotes: 0

Views: 74

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1269973

If I understand correctly, you want a cumulative count(distinct) as a window function. Snowflake does not support that directly, but you can easily calculate it using row_number() and a cumulative sum:

select t.*,
       sum( (seqnum = 1)::int) over (partition by client_id order by appointment_date) as num_distinct_stores
from (select t.*,
             row_number() over (partition by client_id, store_id order by appointment_date) as seqnum
      from t
     ) t;

Upvotes: 1

Related Questions