Chris
Chris

Reputation: 798

SQL -- How do I write a select statement that finds customers who placed orders on 5 consecutive days

orderId, orderDate, customerId are the relevant fields here.

A customer may have more than 1 5-day streak.

What I want the output to look like is something like this

customerID  startDate       endDate      numDays 
1           2020/01/01      2020/01/05    5
1           2020/10/1       2020/10/10   10
101         2020/04/10      2020/04/15    6

So far, this is what I have:

    ;
with t1 as (

    select distinct o.idcustomer,orderdate, dateadd(dd,1,orderdate) nextOrderDate, 1 as tday, orderstatus
    from orders o
        join customers c on c.idcustomer=o.idcustomer
    where orderstatus in (3,4) and c.customertype=0
), t2 as (
    select * from t1
    union all
    select o2.idcustomer, o2.orderdate, dateadd(dd,1,o2.orderdate), o.tday+1, o2.orderstatus
    from t1 o2
        join t2 o on o2.idcustomer=o.idcustomer and o2.orderdate=o.nextOrderDate and o2.orderstatus in (3,4)
)

--select idcustomer, max(tday) DaysInARow, min(orderDate) StartDate, max(orderdate)  endDate
select idcustomer, dateadd(dd,-5,min(orderdate)) firstOrderDate, max(orderdate) lastOrderDate
from t2 
where tday>=5
group by idcustomer, tday
order by idcustomer

Upvotes: 2

Views: 2236

Answers (3)

Chris
Chris

Reputation: 798

Thank you everyone!

This is the code that gives me what I need:

    ;
with t1 as (

    select distinct o.idcustomer,orderdate
    from orders o
        join customers c on c.idcustomer=o.idcustomer
    where orderstatus in (3,4) and c.customertype=0
)
select idCustomer, min(orderdate) startdate, max(orderdate) enddate, count(*) cnt
from (
    select idcustomer, orderdate, 
        row_number() over(partition by idcustomer order by orderdate) rn
    from t1
) t
group by idcustomer, dateadd(day, -rn, orderdate)
having count(*) >= 5 
order by idcustomer, cnt

The CTE gives me a unique idCustomer and orderdate (eliminates multiple orders on the same day) and from there, I used @GMB's (https://stackoverflow.com/users/10676716/gmb) first example to create the output.

Very cool answer using the row_number to calculate the date grouping.

Upvotes: 0

Menelaos
Menelaos

Reputation: 26549

I would use the following approach and begin with a query like the following:

SQLFiddle

WITH truncatedQueries as (
select distinct idcustomer, TRUNC(orderDate) as orderDate from orders 
),
rawData as (
select a.idcustomer,a.orderDate as order1, b.orderDate as order2, b.orderDate-a.orderDate  from truncatedQueries a inner join 
truncatedQueries b on 
a.idcustomer = b.idcustomer AND a.orderDate < b.orderDate  AND
 b.orderDate-a.orderDate =1
),
intervals as (
select idCustomer as CustomerId, min(order1) as StartDate, max(order2) as EndDate from rawData
group by idCustomer order by idCustomer
)
select CustomerId,StartDate, EndDate, endDate - StartDate as numDays  from intervals where endDate - StartDate >=5;

The algorithm is the following:

  1. I truncate the times from the dates, and do a distinct query. By doing this I get 1 entry per order per customer per date.
  2. I then do a self-join, of these results to each other and look for consecutive days. I fnd consecutive days by keeping only the rows that have a 1 day difference between 2 orders (of the same customer).
  3. Then I end up with results like the following:
ID    ORDERDATE   ORDERDATE   DIFFERENCE
    1 22-APR-11   23-APR-11   1
    1 23-APR-11   24-APR-11   1
    2 22-APR-11   23-APR-11   1

Then we have the basic data, but we are missing two more things:

A) The start of the interval B) The end of the interval C) The difference between them (number of days)

  1. For this, I used the min and max functions. If we get the min from the first orderDate column, and the max from the second orderDate column we have the results. The last part is to do the subtraction and to check for >=5 if necessary.

Upvotes: 0

GMB
GMB

Reputation: 222592

This is a gaps-and-islands problem, where you want to group together consecutive days where a customer had an order.

If a customer has at most one order per day, you can build groups using date arithmetics against an incrementing sequence. Assuming that you are running SQL Server, as the syntax of your current query suggests:

select customer_id, min(orderdate) startdate, max(orderdate) enddate, count(*) cnt
from (
    select c.customerid, o.orderdate, 
        row_number() over(partition by customerid order by o.orderdate) rn
    from orders o
    inner join customers c on c.idcustomer = o.idcustomer
    where o.orderstatus in (3, 4) and c.customertype = 0
) t
group by c.customer_id, dateadd(day, -rn, o.orderdate)

If you want to display only streaks of 5 days of more, just add a having clause:

having count(*) >= 5

And, if you just want the greatest streak per customer (with a minimum length of 5):

select *
from (
    select customer_id, min(orderdate) startdate, max(orderdate) enddate, count(*) cnt,
        rank() over(partition by customer_id order by count(*) desc) rn2
    from (
        select c.customerid, o.orderdate, 
            row_number() over(partition by customerid order by o.orderdate) rn
        from orders o
        inner join customers c on c.idcustomer = o.idcustomer
        where o.orderstatus in (3, 4) and c.customertype = 0
    ) t
    group by c.customer_id, dateadd(day, -rn, o.orderdate)
    having count(*) >= 5
) t
where rn2 = 1

If there are duplicates (customerid, orderdate), then we use dense_rank() instead of row_number(), and count(distinct orderdate) instead of count(*).

Upvotes: 1

Related Questions