Reputation: 798
orderId, orderDate, customerId are the relevant fields here.
A customer may have more than 1 5-day streak.
What I want the output to look like is something like this
customerID startDate endDate numDays 1 2020/01/01 2020/01/05 5 1 2020/10/1 2020/10/10 10 101 2020/04/10 2020/04/15 6
So far, this is what I have:
;
with t1 as (
select distinct o.idcustomer,orderdate, dateadd(dd,1,orderdate) nextOrderDate, 1 as tday, orderstatus
from orders o
join customers c on c.idcustomer=o.idcustomer
where orderstatus in (3,4) and c.customertype=0
), t2 as (
select * from t1
union all
select o2.idcustomer, o2.orderdate, dateadd(dd,1,o2.orderdate), o.tday+1, o2.orderstatus
from t1 o2
join t2 o on o2.idcustomer=o.idcustomer and o2.orderdate=o.nextOrderDate and o2.orderstatus in (3,4)
)
--select idcustomer, max(tday) DaysInARow, min(orderDate) StartDate, max(orderdate) endDate
select idcustomer, dateadd(dd,-5,min(orderdate)) firstOrderDate, max(orderdate) lastOrderDate
from t2
where tday>=5
group by idcustomer, tday
order by idcustomer
Upvotes: 2
Views: 2236
Reputation: 798
Thank you everyone!
This is the code that gives me what I need:
;
with t1 as (
select distinct o.idcustomer,orderdate
from orders o
join customers c on c.idcustomer=o.idcustomer
where orderstatus in (3,4) and c.customertype=0
)
select idCustomer, min(orderdate) startdate, max(orderdate) enddate, count(*) cnt
from (
select idcustomer, orderdate,
row_number() over(partition by idcustomer order by orderdate) rn
from t1
) t
group by idcustomer, dateadd(day, -rn, orderdate)
having count(*) >= 5
order by idcustomer, cnt
The CTE gives me a unique idCustomer and orderdate (eliminates multiple orders on the same day) and from there, I used @GMB's (https://stackoverflow.com/users/10676716/gmb) first example to create the output.
Very cool answer using the row_number to calculate the date grouping.
Upvotes: 0
Reputation: 26549
I would use the following approach and begin with a query like the following:
WITH truncatedQueries as (
select distinct idcustomer, TRUNC(orderDate) as orderDate from orders
),
rawData as (
select a.idcustomer,a.orderDate as order1, b.orderDate as order2, b.orderDate-a.orderDate from truncatedQueries a inner join
truncatedQueries b on
a.idcustomer = b.idcustomer AND a.orderDate < b.orderDate AND
b.orderDate-a.orderDate =1
),
intervals as (
select idCustomer as CustomerId, min(order1) as StartDate, max(order2) as EndDate from rawData
group by idCustomer order by idCustomer
)
select CustomerId,StartDate, EndDate, endDate - StartDate as numDays from intervals where endDate - StartDate >=5;
The algorithm is the following:
distinct
query. By doing this I get 1 entry per order per customer per date.ID ORDERDATE ORDERDATE DIFFERENCE 1 22-APR-11 23-APR-11 1 1 23-APR-11 24-APR-11 1 2 22-APR-11 23-APR-11 1
Then we have the basic data, but we are missing two more things:
A) The start of the interval B) The end of the interval C) The difference between them (number of days)
min
and max
functions. If we get the min
from the first orderDate column, and the max
from the second orderDate
column we have the results. The last part is to do the subtraction and to check for >=5
if necessary.Upvotes: 0
Reputation: 222592
This is a gaps-and-islands problem, where you want to group together consecutive days where a customer had an order.
If a customer has at most one order per day, you can build groups using date arithmetics against an incrementing sequence. Assuming that you are running SQL Server, as the syntax of your current query suggests:
select customer_id, min(orderdate) startdate, max(orderdate) enddate, count(*) cnt
from (
select c.customerid, o.orderdate,
row_number() over(partition by customerid order by o.orderdate) rn
from orders o
inner join customers c on c.idcustomer = o.idcustomer
where o.orderstatus in (3, 4) and c.customertype = 0
) t
group by c.customer_id, dateadd(day, -rn, o.orderdate)
If you want to display only streaks of 5 days of more, just add a having
clause:
having count(*) >= 5
And, if you just want the greatest streak per customer (with a minimum length of 5):
select *
from (
select customer_id, min(orderdate) startdate, max(orderdate) enddate, count(*) cnt,
rank() over(partition by customer_id order by count(*) desc) rn2
from (
select c.customerid, o.orderdate,
row_number() over(partition by customerid order by o.orderdate) rn
from orders o
inner join customers c on c.idcustomer = o.idcustomer
where o.orderstatus in (3, 4) and c.customertype = 0
) t
group by c.customer_id, dateadd(day, -rn, o.orderdate)
having count(*) >= 5
) t
where rn2 = 1
If there are duplicates (customerid, orderdate)
, then we use dense_rank()
instead of row_number()
, and count(distinct orderdate)
instead of count(*)
.
Upvotes: 1