PicoDeGallo
PicoDeGallo

Reputation: 608

Find Missing Sequence Values With GROUP BY Criteria

I need to find the missing numbers for a sequence set that are categorized by years and departments. For instance, I have the following set of information in a table:

╔══════╤══════╤═════╗
║ YEAR │ DEPT │ NUM ║
╠══════╪══════╪═════╣
║ 2016 │ 1    │ 1   ║
╟──────┼──────┼─────╢
║ 2016 │ 1    │ 2   ║
╟──────┼──────┼─────╢
║ 2016 │ 1    │ 4   ║
╟──────┼──────┼─────╢
║ 2016 │ 2    │ 10  ║
╟──────┼──────┼─────╢
║ 2016 │ 2    │ 12  ║
╟──────┼──────┼─────╢
║ 2016 │ 2    │ 13  ║
╟──────┼──────┼─────╢
║ 2015 │ 3    │ 6   ║
╟──────┼──────┼─────╢
║ 2015 │ 3    │ 8   ║
╟──────┼──────┼─────╢
║ 2015 │ 3    │ 9   ║
╟──────┼──────┼─────╢
║ 2015 │ 2    │ 24  ║
╟──────┼──────┼─────╢
║ 2015 │ 2    │ 26  ║
╟──────┼──────┼─────╢
║ 2015 │ 2    │ 27  ║
╚══════╧══════╧═════╝

Normally I would LEFT JOIN onto a TALLY table, but I am wanting to retain what YEAR and DEPT the missing values are in. An approach such as below is what I would normally utilize, but it I'm not sure how I can loop back the year and department that the missing value corresponds to, especially since the MIN and MAX value can differ perYEAR and DEPT.

DECLARE @MIN INT = (SELECT MIN(NUM) FROM DOCUMENTS)
DECLARE @MAX INT = (SELECT MAX(NUM) FROM DOCUMENTS)

SELECT
    T.NUM AS 'MISSING'
FROM
    TALLY T
    LEFT JOIN DOCUMENTS D
        ON T.NUM = DOCUMENTS.NUM
WHERE
    D.NUM IS NULL
    AND D.NUM BETWEEN @MIN AND @MAX

My expected output would look like:

╔══════╤══════╤═════════════╗
║ YEAR │ DEPT │ MISSING_NUM ║
╠══════╪══════╪═════════════╣
║ 2016 │ 1    │ 3           ║
╟──────┼──────┼─────────────╢
║ 2016 │ 2    │ 11          ║
╟──────┼──────┼─────────────╢
║ 2015 │ 3    │ 7           ║
╟──────┼──────┼─────────────╢
║ 2015 │ 2    │ 25          ║
╚══════╧══════╧═════════════╝

I'm thinking I might need to create a TALLY table that contains YEAR, DEPT, and a NUM column for each of those, but that would be billions of values as I have years from 1800-2016 and 15 different departments with a NUM range from 1 to 100 million for some of those departments. So I'm not thinking that would be the most efficient/practical of methods.

Upvotes: 3

Views: 659

Answers (2)

Vamsi Prabhala
Vamsi Prabhala

Reputation: 49260

One approach would be using a recursive cte, to generate all the numbers between the min and max num for a combination of year and dept. Thereafter, left join on the generated numbers to find the missing ones.

with t1 as (select yr,dept,max(num) maxnum, min(num) minnum 
            from t 
            group by yr,dept)
,x as (select yr, dept, minnum, maxnum from t1
       union all
       select yr, dept, minnum+1, maxnum 
       from x 
       where minnum < maxnum
       )
select x.yr,x.dept,x.minnum as missing_num 
from x  
left join t on t.yr=x.yr and t.dept=x.dept and t.num = x.minnum
where t.num is null 
order by 1,2,3

Example with sample data

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1269683

If it is only one value that might be missing, you can do:

select t.year, t.dept, t.num + 1
from t
where t.num < (select max(t2.num) from t t2 where t2.year = t.year and t2.dept = t.dept) and
      not exists (select 1
                  from t t2
                  where t2.year = t.year and t2.dept = t.dept and
                        t.num + 1 = t2.num
                 );

In SQL Server 2012+, this can be simplified to:

select year, dept, num + 1 as num
from (select t.*, lead(num) over (partition by year, dept order by num) as next_num
      from t
     ) t
where next_num <> num + 1;  -- Note:  this handles the final num where `next_num` is `NULL`

This approach can actually be generalized to find ranges that are missing. Assuming you are using SQL Server 2012+, then:

select year, dept, num + 1 as start_missing, next_num - 1 as end_missing
from (select t.*, lead(num) over (partition by year, dept order by num) as next_num
      from t
     ) t
where next_num <> num + 1;  -- Note:  this handles the final num where `next_num` is `NULL`

Upvotes: 2

Related Questions