Reputation: 2585
I'm trying to query to calculate consecutive overdue days for some customers. I also have a primary key identifying the customers.
Sample Table
Date (d.m.y) Name
01.01.2014 Alex
02.01.2014 Alex
03.01.2014 Alex
01.01.2014 Bianca
02.01.2014 Bianca
08.07.2014 Alex
09.07.2014 Alex
10.07.2014 Alex
11.07.2014 Alex
How can I differentiate the names and only count consecutive days when making a SELECT COUNT()
?
Desired Result
Name Overdue Day Count Date
Alex 3 01.01.2014 <== The date is the first overdue date
Bianca 2 01.01.2014
Alex 4 08.07.2014
Upvotes: 3
Views: 2855
Reputation: 123654
Queries that use [NOT] EXISTS
can be slow, so here is a solution that might run a bit faster.
We start by creating a query that finds the starting date for each consecutive group of dates by name
SELECT t1.Date, t1.Name
FROM
T AS t1
LEFT JOIN
T AS t2
ON t1.Name=t2.Name
AND t1.Date=DateDiff("d",-1,t2.Date)
WHERE t2.Date IS NULL
It gives us
Date Name
---------- ------
2014-01-01 Alex
2014-01-01 Bianca
2014-07-08 Alex
We can link that query back to the main table with an unequal join
SELECT t3.Date, t3.Name
FROM
T AS t3
INNER JOIN
(
SELECT t1.Date, t1.Name
FROM
T AS t1
LEFT JOIN
T AS t2
ON t1.Name=t2.Name
AND t1.Date=DateDiff("d",-1,t2.Date)
WHERE t2.Date IS NULL
) AS StartDates
ON t3.Name=StartDates.Name AND t3.Date>=StartDates.Date
to produce the following, which repeats rows for each successive group
Date Name
---------- ------
2014-01-01 Alex
2014-01-02 Alex
2014-01-03 Alex
2014-01-01 Bianca
2014-01-02 Bianca
2014-07-08 Alex
2014-07-08 Alex
2014-07-09 Alex
2014-07-09 Alex
2014-07-10 Alex
2014-07-10 Alex
2014-07-11 Alex
2014-07-11 Alex
so if we tweak that into an aggregation query we can assign a group number for each row in the original table
SELECT t3.Date, t3.Name, COUNT(*) AS GroupNo
FROM
T AS t3
INNER JOIN
(
SELECT t1.Date, t1.Name
FROM
T AS t1
LEFT JOIN
T AS t2
ON t1.Name=t2.Name
AND t1.Date=DateDiff("d",-1,t2.Date)
WHERE t2.Date IS NULL
) AS StartDates
ON t3.Name=StartDates.Name AND t3.Date>=StartDates.Date
GROUP BY t3.Date, t3.Name
resulting in
Date Name GroupNo
---------- ------ -------
2014-01-01 Alex 1
2014-01-02 Alex 1
2014-01-03 Alex 1
2014-01-01 Bianca 1
2014-01-02 Bianca 1
2014-07-08 Alex 2
2014-07-09 Alex 2
2014-07-10 Alex 2
2014-07-11 Alex 2
Finally, we can wrap the whole thing in another aggregation query
SELECT
Grouped.Name,
COUNT(*) AS DaysOverdue,
MIN(Grouped.Date) AS OverdueSince
FROM
(
SELECT t3.Date, t3.Name, COUNT(*) AS GroupNo
FROM
T AS t3
INNER JOIN
(
SELECT t1.Date, t1.Name
FROM
T AS t1
LEFT JOIN
T AS t2
ON t1.Name=t2.Name
AND t1.Date=DateDiff("d",-1,t2.Date)
WHERE t2.Date IS NULL
) AS StartDates
ON t3.Name=StartDates.Name AND t3.Date>=StartDates.Date
GROUP BY t3.Date, t3.Name
) AS Grouped
GROUP BY Grouped.Name, Grouped.GroupNo
ORDER BY 3, 1
to produce the final result
Name DaysOverdue OverdueSince
------ ----------- ------------
Alex 3 2014-01-01
Bianca 2 2014-01-01
Alex 4 2014-07-08
Upvotes: 2
Reputation: 69769
This is a gaps and islands problem (specifically islands) - Unfortunately I think the only solution supported in access is very inefficient:
SELECT Name,
COUNT(*) AS Days,
MIN(Date) AS FirstDate,
MAX(Date) AS LastDate
FROM ( SELECT Name,
Date,
( SELECT MIN(B.Date)
FROM T AS B
WHERE B.Date >= A.Date
AND B.Name = A.Name
AND NOT EXISTS
( SELECT 1
FROM T AS C
WHERE C.Name = B.Name
AND C.Date = B.Date + 1
)
) AS grp
FROM T AS A
) AS D
GROUP BY Name, grp;
There is a full explanation in the article linked above, but the sub-queries used to create the column grp
find the end of each particular island, and then this value can be used to group the outer query.
Upvotes: 5