Reputation: 103
I have below table called orders which has customer id and their order date (Note: there can be multiple orders from same customer on a single day)
create table orders (Id char, order_dt date)
insert into orders values
('A','1/1/2020'),
('B','1/1/2020'),
('C','1/1/2020'),
('D','1/1/2020'),
('A','1/1/2020'),
('B','1/1/2020'),
('A','2/1/2020'),
('B','2/1/2020'),
('C','2/1/2020'),
('B','2/1/2020'),
('A','3/1/2020'),
('B','3/1/2020')
I'm trying to write an SQL query to find the number of customers who shopped for 3 consecutive days in month of January 2020
Based on above order values, the output should be: 2
I referred other similar questions but still wasn't able to come the exact solution
Upvotes: 0
Views: 4742
Reputation: 486
Here is my solution which works fine even there are many orders of one customer in one day;
Some scripts to build test environment:
create table orders (Id varchar2(1), order_dt date);
insert into orders values('A',to_date('01/01/2020','dd/mm/yyyy'));
insert into orders values('B',to_date('01/01/2020','dd/mm/yyyy'));
insert into orders values('C',to_date('01/01/2020','dd/mm/yyyy'));
insert into orders values('D',to_date('01/01/2020','dd/mm/yyyy'));
insert into orders values('A',to_date('01/01/2020','dd/mm/yyyy'));
insert into orders values('B',to_date('01/01/2020','dd/mm/yyyy'));
insert into orders values('A',to_date('02/01/2020','dd/mm/yyyy'));
insert into orders values('B',to_date('02/01/2020','dd/mm/yyyy'));
insert into orders values('C',to_date('02/01/2020','dd/mm/yyyy'));
insert into orders values('B',to_date('02/01/2020','dd/mm/yyyy'));
insert into orders values('A',to_date('03/01/2020','dd/mm/yyyy'));
insert into orders values('B',to_date('03/01/2020','dd/mm/yyyy'));
select distinct id, count_days from (
select id,
order_dt,
count(*) over(partition by id order by order_dt range between 1 preceding and 1 following ) count_days
from orders group by id, order_dt
)
where count_days = 3;
-- Insert for test more days than 3 consecutive
insert into orders values('A',to_date('04/01/2020','dd/mm/yyyy'));
Upvotes: 2
Reputation: 1270573
Hmmmm . . . one method is to use lead()
/lag()
. Assuming that you don't have duplicates on a single day, then:
select distinct id
from (select o.*,
lag(order_dt) over (partition by id order by order_dt) as prev_order_dt,
lag(order_dt, 2) over (partition by id order by order_dt) as prev_order_dt2
from orders o
where order_dt >= date '2020-01-01' and
order_dt < date '2020-02-01'
) o
where prev_order_dt = order_dt - interval '1' day and
prev_order_dt2 = order_dt - interval '2' day;
EDIT:
If the table has duplicate records, the above is easily tweaked:
select distinct id
from (select o.*,
lag(order_dt) over (partition by id order by order_dt) as prev_order_dt,
lag(order_dt, 2) over (partition by id order by order_dt) as prev_order_dt2
from (select distinct o.id, trunc(order_dt) as order_dt
from orders o
where order_dt >= date '2020-01-01' and
order_dt < date '2020-02-01'
) o
) o
where prev_order_dt = order_dt - interval '1' day and
prev_order_dt2 = order_dt - interval '2' day;
Upvotes: 1
Reputation: 48169
why not join twice based on same following two days. As long as you have index on the customer's ID and date, the join should be optimized. Because the joins require match on the same starting date basis, it either finds or it doesn't. If not, it is left out of the result set.
select distinct
o1.id
from
orders o1
JOIN orders o2
on o1.id = o2.id
AND o1.order_dt = o2.order_dt - interval '1' day
JOIN orders o3
on o1.id = o3.id
AND o1.order_dt = o3.order_dt - interval '2' day
Upvotes: 1
Reputation: 6751
You can use two window functions to calculate difference between consequtive dates and sliding window with ROWS
offset to count distinct preceiding consequtive days. Example here:
with gen as (
select 1 as cust_id, (date '2020-01-10') + 1 as q from dual union all
select 1, (date '2020-01-10') + 2 as q from dual union all
select 1, (date '2020-01-10') + 3 as q from dual union all
select 1, (date '2020-01-10') + 3 as q from dual union all
select 1, (date '2020-01-10') + 5 as q from dual union all
select 1, (date '2020-01-10') + 7 as q from dual union all
select 1, (date '2020-01-10') + 8 as q from dual union all
select 1, (date '2020-01-10') + 9 as q from dual
)
, diff as (
select gen.*
, q - lag(q) over(partition by cust_id, trunc(q, 'mm') order by q asc) as datediff
from gen
)
, window as (
select diff.*
, sum(decode(datediff, 1, 1, 0)) over(partition by cust_id, trunc(q, 'mm') order by q asc range between 2 preceding and current row) as cnt
from diff
)
select sum(count(distinct q)) as cnt
from window
where cnt = 2
group by cust_id
Upvotes: 1