gene
gene

Reputation: 338

SQL count of 90 day gaps between records

Say I have a Payment table. I need to know the number of times the gap between payments is greater than 90 days grouped by personID. Payment frequency varies. There is no expected number of payments. There could 0 or hundreds of payments in a 90 day period. If there was no payment for a year, that would count as 1. If there was a payment every month, the count would be 0. If there were 4 payments the first month, then a 90 day gap, then 2 more payments, then another 90 day gap, the count would be 2.

CREATE TABLE Payments
(
    ID int PRIMARY KEY,
    PersonID int FOREIGN KEY REFERENCES Persons(ID),
    CreateDate datetime
)

Upvotes: 1

Views: 1158

Answers (2)

JohnLBevan
JohnLBevan

Reputation: 24410

If you have SQL Server 2014, you can use the LAG or LEAD function to peek at other rows, making this easy:

Select PersonId, Sum(InfrequentPayment) InfrequentPayments
from
(
    select PersonId
    , case 
        when dateadd(day,@period,paymentdate) < coalesce(lead(PaymentDate) over (partition by personid order by PaymentDate),getutcdate()) 
        then 1 
        else 0 
      end InfrequentPayment
    from @Payment
) x
Group by PersonId

Demo: http://sqlfiddle.com/#!6/9eecb7d/491

Explanation:

The outer SQL is fairly trivial; we take the results of the inner SQL, group by PersonId, and count/sum the number of times they've paid payment judged as Infrequent.

The inner SQL is also simple; we're selecting every record, making a note of the person and whether that payment (or rather the delay after that payment) was judged infrequent.

The case statement determines what constitutes an infrequent payment. Here we say that if the record's paymentdate plus 90 days is still earlier than the next payment (or current date if it's the last payment, so there's no next payment) then it's infrequent (1); otherwise it's not (0).

The coalesce is simply there to handle the last record for a person; i.e. so that if there is no next payment the current date is used (thus capturing anyone who's last payment was over 90 days before today).

Now for the "clever" bit: lead(PaymentDate) over (partition by personid order by PaymentDate). LEAD is a new SQL function which lets you look at the record after the current one (LAG is to see the previous record).
If you're familiar with row_number() or rank() you may already understand what's going on here. To determine the record after the current one we don't look at the current query though; rather we specify an order by clause just for this function; that's what's in the brackets after the over keyword. We also want to only compare each person's payment dates with other payments made by them; not by any customer. To achieve that we use the partition by clause.

I hope that makes sense / meets your requirement. Please say if anything's unclear and I'll try to improve my explanation.


EDIT

For older versions of SQL, the same effect can be achieved by use or ROW_NUMBER and a LEFT OUTER JOIN; i.e.

;with cte (PersonId, PaymentDate, SequenceNo) as
(
    select PersonId
    , PaymentDate
    , ROW_NUMBER() over (partition by PersonId order by PaymentDate)
    from @Payment
)
select a.PersonId
, sum(case when dateadd(day,@period,a.paymentdate) < coalesce(b.paymentdate,getutcdate()) then 1 else 0 end) InfrequentPayments
from cte a
left outer join cte b 
on b.PersonId = a.PersonId 
and b.SequenceNo = a.SequenceNo + 1
Group by a.PersonId

Another method which should work on most databases (though less efficient)

select PersonId
, sum(InfrequentPayment) InfrequentPayments
from 
(
    select PersonId
    , case when dateadd(day,@period,paymentdate) < coalesce((
        select min(PaymentDate) 
        from @Payment b 
        where b.personid = a.personid
        and b.paymentdate > a.paymentdate
    ),getutcdate()) then 1 else 0 end InfrequentPayment
    from @Payment a
) x
Group by PersonId

Upvotes: 3

Bulat
Bulat

Reputation: 6969

Generic query for this problem given a timestamp field would be something like this:

SELECT p1.personID, COUNT(*)
FROM payments p1
JOIN payments p2 ON
  p1.timestamp < p2.timestamp
  AND p1.personID = p2.personID
  AND NOT EXISTS (-- exclude combinations of p1 and p2 where p exists between them
   SELECT * FROM payments p 
   WHERE p.personID = p1.personID
   AND p.timestamp > p1.timestamp 
   AND p.timestamp < p2.timestamp)
WHERE
  DATEDIFF(p2.timestamp, p1.timestamp) >= 90
GROUP BY p1.personID

Upvotes: 0

Related Questions