user2823030
user2823030

Reputation: 109

Determining total of consecutive values using SQL

I would like to determine the number of consecutive absences as per the following table. Initial research suggests I may be able to achieve this using a window function. For the data provided, the longest streak is four consecutive occurrences. Please can you advise how I can set a running absence total as a separate column.

create table events (eventdate date, absence int);

insert into events values ('2014-10-01', 0);
insert into events values ('2014-10-08', 1);
insert into events values ('2014-10-15', 1);
insert into events values ('2014-10-22', 0);
insert into events values ('2014-11-05', 0);
insert into events values ('2014-11-12', 1);
insert into events values ('2014-11-19', 1);
insert into events values ('2014-11-26', 1);
insert into events values ('2014-12-03', 1);
insert into events values ('2014-12-10', 0);

Upvotes: 2

Views: 775

Answers (4)

devlin carnate
devlin carnate

Reputation: 8591

Based on Gordon Linhoff's answer here, you could do:

SELECT TOP 1
        MIN(eventdate) AS spanStart ,
        MAX(eventdate) AS spanEnd,
        COUNT(*) AS spanLength
FROM    ( SELECT    e.* ,
                    ( ROW_NUMBER() OVER ( ORDER BY eventdate )
                      - ROW_NUMBER() OVER ( PARTITION BY absence ORDER BY eventdate ) ) AS grp
          FROM      #events e
        ) t
GROUP BY grp ,
        absence
HAVING  absence = 1
ORDER BY COUNT(*) DESC;

Which returns:

spanStart   | spanEnd   | spanLength
---------------------------------------
2014-11-12  |2014-12-03 | 4

Upvotes: 3

CodyMR
CodyMR

Reputation: 415

I don't know what your DBMS is but this is from SQLServer. Hopefully it is of some help : )

-------------------------------------------------------------------------------------------
Query:

--tableRN is used to get the rownumber
;with tableRN as (SELECT a.*, ROW_NUMBER() OVER (ORDER BY a.event) as rn, COUNT(*) as maxRN
                 FROM absence a GROUP BY a.event, a.absence),

--cte is a recursive function that returns the...
--absence value, the level (amount of times 1 appeared in a row)
--rn (row number), total (total count
cte (absence, level, rn, total) AS (
SELECT 0, 0, 1, 0
UNION ALL 
SELECT r.absence, 
       CASE WHEN c.absence = 1 AND r.absence = 1 THEN level + 1
                                                 ELSE 0
       END, 
       c.rn + 1, 
       CASE WHEN c.level = 1 THEN total + 1
                             ELSE total
       END
FROM cte c JOIN tableRN r ON c.rn + 1 = r.rn)

--This gets you the total count of times there 
--was a consective absent (twice or more in a row).
SELECT MAX(c.total) AS Count FROM cte c

-------------------------------------------------------------------------------------------
Results:

|Count|
+-----+
|  2  |

Upvotes: 0

pilcrow
pilcrow

Reputation: 58589

You don't specify which RDBMS you are using, but the following works under postgresql's window functions and should be translatable to similar SQL engines:

SELECT eventdate,
       absence,
       -- XXX We take advantage of the fact that absence is an int (1 or 0)
       --     otherwise we'd COUNT(1) OVER (...) and only conditionally
       --     display the count if absence = 1
       SUM(absence) OVER (PARTITION BY span ORDER BY eventdate)
         AS consecutive_absences
  FROM (SELECT spanstarts.*,
               SUM(newspan) OVER (ORDER BY eventdate) AS span
          FROM (SELECT events.*,
                CASE LAG(absence) OVER (ORDER BY eventdate)
                  WHEN absence THEN NULL
                  ELSE 1 END AS newspan
                  FROM events)
                spanstarts
        ) eventsspans
ORDER BY eventdate;

which gives you:

 eventdate  | absence | consecutive_absences 
------------+---------+----------------------
 2014-10-01 |       0 |                    0
 2014-10-08 |       1 |                    1
 2014-10-15 |       1 |                    2
 2014-10-22 |       0 |                    0
 2014-11-05 |       0 |                    0
 2014-11-12 |       1 |                    1
 2014-11-19 |       1 |                    2
 2014-11-26 |       1 |                    3
 2014-12-03 |       1 |                    4
 2014-12-10 |       0 |                    0

There is an excellent dissection of the above approach on the pgsql-general mailing list. The short of it is:

  1. Innermost query (spanstarts) uses LAG to find the start of new spans of absences, whether a span of 1's or a span 0's
  2. Next query (eventsspans) identifies those spans by summing the number of new spans that have come before us. So, we find span 1, then span 2, then 3, etc.
  3. The outer query the counts the number of absences in each span.

As the SQL comment says, we cheat a little bit on #3, taking advantage of its data type, but the net effect is the same.

Upvotes: 2

Create a new column called consecutive_absence_count with default 0.

You may write a SQL procedure for insert - Fetch the latest record, retrieve the absence value, identify if the new record to be inserted has a present or an absent value.

If they latest and the new record have consecutive dates and absence value set to 0, increment the consecutive_absence_count else set it to 0.

Upvotes: -1

Related Questions