Reputation: 109
I would like to determine the number of consecutive absences as per the following table. Initial research suggests I may be able to achieve this using a window function. For the data provided, the longest streak is four consecutive occurrences. Please can you advise how I can set a running absence total as a separate column.
create table events (eventdate date, absence int);
insert into events values ('2014-10-01', 0);
insert into events values ('2014-10-08', 1);
insert into events values ('2014-10-15', 1);
insert into events values ('2014-10-22', 0);
insert into events values ('2014-11-05', 0);
insert into events values ('2014-11-12', 1);
insert into events values ('2014-11-19', 1);
insert into events values ('2014-11-26', 1);
insert into events values ('2014-12-03', 1);
insert into events values ('2014-12-10', 0);
Upvotes: 2
Views: 775
Reputation: 8591
Based on Gordon Linhoff's answer here, you could do:
SELECT TOP 1
MIN(eventdate) AS spanStart ,
MAX(eventdate) AS spanEnd,
COUNT(*) AS spanLength
FROM ( SELECT e.* ,
( ROW_NUMBER() OVER ( ORDER BY eventdate )
- ROW_NUMBER() OVER ( PARTITION BY absence ORDER BY eventdate ) ) AS grp
FROM #events e
) t
GROUP BY grp ,
absence
HAVING absence = 1
ORDER BY COUNT(*) DESC;
Which returns:
spanStart | spanEnd | spanLength
---------------------------------------
2014-11-12 |2014-12-03 | 4
Upvotes: 3
Reputation: 415
I don't know what your DBMS is but this is from SQLServer. Hopefully it is of some help : )
-------------------------------------------------------------------------------------------
Query:
--tableRN is used to get the rownumber
;with tableRN as (SELECT a.*, ROW_NUMBER() OVER (ORDER BY a.event) as rn, COUNT(*) as maxRN
FROM absence a GROUP BY a.event, a.absence),
--cte is a recursive function that returns the...
--absence value, the level (amount of times 1 appeared in a row)
--rn (row number), total (total count
cte (absence, level, rn, total) AS (
SELECT 0, 0, 1, 0
UNION ALL
SELECT r.absence,
CASE WHEN c.absence = 1 AND r.absence = 1 THEN level + 1
ELSE 0
END,
c.rn + 1,
CASE WHEN c.level = 1 THEN total + 1
ELSE total
END
FROM cte c JOIN tableRN r ON c.rn + 1 = r.rn)
--This gets you the total count of times there
--was a consective absent (twice or more in a row).
SELECT MAX(c.total) AS Count FROM cte c
-------------------------------------------------------------------------------------------
Results:
|Count|
+-----+
| 2 |
Upvotes: 0
Reputation: 58589
You don't specify which RDBMS you are using, but the following works under postgresql's window functions and should be translatable to similar SQL engines:
SELECT eventdate,
absence,
-- XXX We take advantage of the fact that absence is an int (1 or 0)
-- otherwise we'd COUNT(1) OVER (...) and only conditionally
-- display the count if absence = 1
SUM(absence) OVER (PARTITION BY span ORDER BY eventdate)
AS consecutive_absences
FROM (SELECT spanstarts.*,
SUM(newspan) OVER (ORDER BY eventdate) AS span
FROM (SELECT events.*,
CASE LAG(absence) OVER (ORDER BY eventdate)
WHEN absence THEN NULL
ELSE 1 END AS newspan
FROM events)
spanstarts
) eventsspans
ORDER BY eventdate;
which gives you:
eventdate | absence | consecutive_absences
------------+---------+----------------------
2014-10-01 | 0 | 0
2014-10-08 | 1 | 1
2014-10-15 | 1 | 2
2014-10-22 | 0 | 0
2014-11-05 | 0 | 0
2014-11-12 | 1 | 1
2014-11-19 | 1 | 2
2014-11-26 | 1 | 3
2014-12-03 | 1 | 4
2014-12-10 | 0 | 0
There is an excellent dissection of the above approach on the pgsql-general mailing list. The short of it is:
spanstarts
) uses LAG to find the start of new
spans of absences, whether a span of 1's or a span 0'seventsspans
) identifies those spans by summing the number of new spans that have come before us. So, we find span 1, then span 2, then 3, etc.As the SQL comment says, we cheat a little bit on #3, taking advantage of its data type, but the net effect is the same.
Upvotes: 2
Reputation: 846
Create a new column called consecutive_absence_count
with default 0.
You may write a SQL procedure for insert - Fetch the latest record, retrieve the absence value, identify if the new record to be inserted has a present or an absent value.
If they latest and the new record have consecutive dates and absence value set to 0
, increment the consecutive_absence_count
else set it to 0.
Upvotes: -1