bzflag
bzflag

Reputation: 103

SQL query to find the number of customers who shopped for 3 consecutive days in month of January 2020

I have below table called orders which has customer id and their order date (Note: there can be multiple orders from same customer on a single day)

create table orders (Id char, order_dt date)

insert into orders values
('A','1/1/2020'),
('B','1/1/2020'),
('C','1/1/2020'),
('D','1/1/2020'),
('A','1/1/2020'),
('B','1/1/2020'),
('A','2/1/2020'),
('B','2/1/2020'),
('C','2/1/2020'),
('B','2/1/2020'),
('A','3/1/2020'),
('B','3/1/2020')

I'm trying to write an SQL query to find the number of customers who shopped for 3 consecutive days in month of January 2020

Based on above order values, the output should be: 2

I referred other similar questions but still wasn't able to come the exact solution

Upvotes: 0

Views: 4742

Answers (4)

Monika Lewandowska
Monika Lewandowska

Reputation: 486

Here is my solution which works fine even there are many orders of one customer in one day;

Some scripts to build test environment:

create table orders (Id varchar2(1), order_dt date);

insert into orders values('A',to_date('01/01/2020','dd/mm/yyyy'));
insert into orders values('B',to_date('01/01/2020','dd/mm/yyyy'));
insert into orders values('C',to_date('01/01/2020','dd/mm/yyyy'));
insert into orders values('D',to_date('01/01/2020','dd/mm/yyyy'));
insert into orders values('A',to_date('01/01/2020','dd/mm/yyyy'));
insert into orders values('B',to_date('01/01/2020','dd/mm/yyyy'));
insert into orders values('A',to_date('02/01/2020','dd/mm/yyyy'));
insert into orders values('B',to_date('02/01/2020','dd/mm/yyyy'));
insert into orders values('C',to_date('02/01/2020','dd/mm/yyyy'));
insert into orders values('B',to_date('02/01/2020','dd/mm/yyyy'));
insert into orders values('A',to_date('03/01/2020','dd/mm/yyyy'));
insert into orders values('B',to_date('03/01/2020','dd/mm/yyyy'));


 select distinct id,  count_days from (
    select id,
           order_dt,
           count(*) over(partition by id order by order_dt range between 1  preceding  and 1 following  )  count_days
    from orders group by  id, order_dt
)
where count_days = 3;

--  Insert for test more days than 3 consecutive

insert into orders values('A',to_date('04/01/2020','dd/mm/yyyy'));

Upvotes: 2

Gordon Linoff
Gordon Linoff

Reputation: 1270573

Hmmmm . . . one method is to use lead()/lag(). Assuming that you don't have duplicates on a single day, then:

select distinct id
from (select o.*,
             lag(order_dt) over (partition by id order by order_dt) as prev_order_dt,
             lag(order_dt, 2) over (partition by id order by order_dt) as prev_order_dt2
      from orders o
      where order_dt >= date '2020-01-01' and 
            order_dt < date '2020-02-01'
     ) o
where prev_order_dt = order_dt - interval '1' day and
      prev_order_dt2 = order_dt - interval '2' day;

EDIT:

If the table has duplicate records, the above is easily tweaked:

select distinct id
from (select o.*,
             lag(order_dt) over (partition by id order by order_dt) as prev_order_dt,
             lag(order_dt, 2) over (partition by id order by order_dt) as prev_order_dt2
      from (select distinct o.id, trunc(order_dt) as order_dt
            from orders o
            where order_dt >= date '2020-01-01' and 
                  order_dt < date '2020-02-01'
           ) o
     ) o
where prev_order_dt = order_dt - interval '1' day and
      prev_order_dt2 = order_dt - interval '2' day;

Upvotes: 1

DRapp
DRapp

Reputation: 48169

why not join twice based on same following two days. As long as you have index on the customer's ID and date, the join should be optimized. Because the joins require match on the same starting date basis, it either finds or it doesn't. If not, it is left out of the result set.

select distinct 
      o1.id
   from
      orders o1
         JOIN orders o2
           on o1.id = o2.id
           AND o1.order_dt = o2.order_dt - interval '1' day
         JOIN orders o3
           on o1.id = o3.id
           AND o1.order_dt = o3.order_dt - interval '2' day

Upvotes: 1

astentx
astentx

Reputation: 6751

You can use two window functions to calculate difference between consequtive dates and sliding window with ROWS offset to count distinct preceiding consequtive days. Example here:

with gen as (
  select 1 as cust_id, (date '2020-01-10') + 1 as q from dual union all
  select 1, (date '2020-01-10') + 2 as q from dual union all
  select 1, (date '2020-01-10') + 3 as q from dual union all
  select 1, (date '2020-01-10') + 3 as q from dual union all
  select 1, (date '2020-01-10') + 5 as q from dual union all
  select 1, (date '2020-01-10') + 7 as q from dual union all
  select 1, (date '2020-01-10') + 8 as q from dual union all
  select 1, (date '2020-01-10') + 9 as q from dual
)
, diff as (
  select gen.*
   , q - lag(q) over(partition by cust_id, trunc(q, 'mm') order by q asc) as datediff
  from gen
)
, window as (
  select diff.*
    , sum(decode(datediff, 1, 1, 0)) over(partition by cust_id, trunc(q, 'mm') order by q asc range between 2 preceding and current row) as cnt
  from diff
)
select sum(count(distinct q)) as cnt
from window
where cnt = 2
group by cust_id

Upvotes: 1

Related Questions