a_se
a_se

Reputation: 11

Find customers with at least 5 transactions in At most 3 consecutive days

I have a table in SQL Server that contains customers' transactions From 2022-02-10 to 2022-03-10.

I want to find customers that have at least 5 transactions on At most three consecutive days

For example, output of below table should be CustomerId = 2 and customerid=3

Id CustomerId Transactiondate
1 1 2022-03-01
2 1 2022_03_01
3 1 2022_03_05
4 1 2022_03_07
5 1 2022_03_07
6 2 2022_03_05
7 2 2022_03_05
8 2 2022_03_06
9 2 2022_03_06
10 2 2022_03_07
1 3 2022-03-01
2 3 2022_03_01
3 3 2022_03_01
4 3 2022_03_03
5 3 2022_03_03

I tried this query but it doesn't have good performance for a large table:

select distinct p1.customerid
from trntbl p1
join trntbl p2 on p2.id <> p1.id
               and p2.customerid = p1.customerid
               and p2.TransactionDate >= p1.TransactionDate
               and p2.TransactionDate < date_add(day, 3, p1.prchasedate)
group by p1.customerid, p1.id
having count(*) >= 4

Upvotes: 1

Views: 1374

Answers (4)

Stu
Stu

Reputation: 32629

This is actually a gaps and islands problem, you can solve by using analytic window functions to subtract sequential row_number from consecutive days and then grouping, after first "plugging" any gaps with the help of a numbers table.

with numbers as (select top(20) Row_Number() over(order by (select null))-1 n from master.dbo.spt_values),
dRanges as (
  select customerId,
        Min(Transactiondate) CustStartDate, 
        Max(Transactiondate) CustEndDate 
  from t
  group by CustomerId
), dates as (
    select * 
    from dranges r
    outer apply (
        select DateAdd(day,n,r.CustStartDate) SeqDate
        from numbers n
        where DateAdd(day,n,r.CustStartDate) < = r.CustEndDate 
    )d
), q as (
    select customerId, transactiondate, Count(*) qty
    from t
    group by CustomerId, Transactiondate
), g as (
    select d.CustomerId, d.SeqDate, IsNull(q.qty,0)Qty, 
      DateAdd(day, - row_number() over (partition by d.customerid order by d.SeqDate), d.SeqDate) as dGrp
    from dates d
    left join q on q.Transactiondate = d.SeqDate and q.CustomerId = d.CustomerId
)
select customerId
from g
group by CustomerId, dGrp
having Count(*) <= 3 and Sum(qty) >= 5

DB<>Fiddle

Upvotes: 1

Coder1991
Coder1991

Reputation: 735

You could make use of datediff function and verify if the sum of the date differences are between 3 and 5 (provided the max of the differences is just 1) since the dates might be unique (for example customerid 2 can have transaction dates as 5,6,7,8,9 of March 2022) and this should be taken into account too.

declare @tbl table(id int identity,customerid int,transactiondate date)

insert into @tbl(customerid,transactiondate)
values(1,'2022-03-01')
,(1,'2022-03-01')
,(1,'2022-03-05')
,(1,'2022-03-07')
,(1,'2022-03-07')
,(2,'2022-03-05')
,(2,'2022-03-05')
,(2,'2022-03-06')
,(2,'2022-03-06')
,(2,'2022-03-07')

select customerid from (
select *
,SUM(datediff)over(partition by customerid order by transactiondate)[sum]
,max(datediff)over(partition by customerid order by transactiondate)[max]
from( 
select customerid , transactiondate,
DATEDIFF(DAY
, 
case when LEAD(transactiondate,1)over(partition by customerid order by transactiondate) 
is null then 
LAG(transactiondate,1,transactiondate)
over(partition by customerid order by transactiondate)
else
transactiondate end

, case when LEAD(transactiondate,1)over(partition by customerid order by transactiondate) 
is null then 
transactiondate
else
LEAD(transactiondate,1,transactiondate)
over(partition by customerid order by transactiondate)end) as [datediff]
,ROW_NUMBER()over(partition by customerid order by transactiondate)rownum
from @tbl
)t
)t1
where t1.rownum = 5
and t1.max = 1
and t1.sum between 3 and 5

Upvotes: 0

Charlieface
Charlieface

Reputation: 72087

Although this is a gaps-and-islands problem, there are shortcuts you can take.

You can group it up by date, then get the row 2 previous, and filter by only rows where the 2 previous row is exactly two days apart.

SELECT DISTINCT
  CustomerId
FROM (
    SELECT
      t.CustomerId,
      v.Date,
      Prev2 = LAG(v.Date, 2) OVER (PARTITION BY t.CustomerId ORDER BY v.Date)
    FROM YourTable t
    CROSS APPLY (VALUES( CAST(Transactiondate AS date) )) v(Date)
    GROUP BY
      t.CustomerId,
      v.Date
) t
WHERE DATEDIFF(day, t.Prev2, t.Date) = 2

db<>fiddle

If the base table only has a maximum of one row per date then you can forgo the GROUP BY.

Upvotes: 1

Zakaria
Zakaria

Reputation: 4796

If customers must have done transactions in three consecutive days (meaning that 5 transactions in a day then nothing in the next two days wouldn't count), then this can be done with two self joins:

with cte as
(select CustomerId, Transactiondate, count(*) ct
from table_name
group by CustomerId, Transactiondate)
select distinct t1.CustomerId
from cte t1  inner join cte t2
on t1.Transactiondate = dateadd(day, 1, t2.Transactiondate)
and t1.CustomerId = t2.CustomerId
inner join cte t3
on t2.Transactiondate = dateadd(day, 1, t3.Transactiondate)
and t3.CustomerId = t2.CustomerId
;

Fiddle

Upvotes: 2

Related Questions