Reputation: 608
I need to find the missing numbers for a sequence set that are categorized by years and departments. For instance, I have the following set of information in a table:
╔══════╤══════╤═════╗
║ YEAR │ DEPT │ NUM ║
╠══════╪══════╪═════╣
║ 2016 │ 1 │ 1 ║
╟──────┼──────┼─────╢
║ 2016 │ 1 │ 2 ║
╟──────┼──────┼─────╢
║ 2016 │ 1 │ 4 ║
╟──────┼──────┼─────╢
║ 2016 │ 2 │ 10 ║
╟──────┼──────┼─────╢
║ 2016 │ 2 │ 12 ║
╟──────┼──────┼─────╢
║ 2016 │ 2 │ 13 ║
╟──────┼──────┼─────╢
║ 2015 │ 3 │ 6 ║
╟──────┼──────┼─────╢
║ 2015 │ 3 │ 8 ║
╟──────┼──────┼─────╢
║ 2015 │ 3 │ 9 ║
╟──────┼──────┼─────╢
║ 2015 │ 2 │ 24 ║
╟──────┼──────┼─────╢
║ 2015 │ 2 │ 26 ║
╟──────┼──────┼─────╢
║ 2015 │ 2 │ 27 ║
╚══════╧══════╧═════╝
Normally I would LEFT JOIN
onto a TALLY
table, but I am wanting to retain what YEAR
and DEPT
the missing values are in. An approach such as below is what I would normally utilize, but it I'm not sure how I can loop back the year and department that the missing value corresponds to, especially since the MIN
and MAX
value can differ perYEAR
and DEPT
.
DECLARE @MIN INT = (SELECT MIN(NUM) FROM DOCUMENTS)
DECLARE @MAX INT = (SELECT MAX(NUM) FROM DOCUMENTS)
SELECT
T.NUM AS 'MISSING'
FROM
TALLY T
LEFT JOIN DOCUMENTS D
ON T.NUM = DOCUMENTS.NUM
WHERE
D.NUM IS NULL
AND D.NUM BETWEEN @MIN AND @MAX
My expected output would look like:
╔══════╤══════╤═════════════╗
║ YEAR │ DEPT │ MISSING_NUM ║
╠══════╪══════╪═════════════╣
║ 2016 │ 1 │ 3 ║
╟──────┼──────┼─────────────╢
║ 2016 │ 2 │ 11 ║
╟──────┼──────┼─────────────╢
║ 2015 │ 3 │ 7 ║
╟──────┼──────┼─────────────╢
║ 2015 │ 2 │ 25 ║
╚══════╧══════╧═════════════╝
I'm thinking I might need to create a TALLY
table that contains YEAR
, DEPT
, and a NUM
column for each of those, but that would be billions of values as I have years from 1800-2016 and 15 different departments with a NUM
range from 1 to 100 million for some of those departments. So I'm not thinking that would be the most efficient/practical of methods.
Upvotes: 3
Views: 659
Reputation: 49260
One approach would be using a recursive cte, to generate all the numbers between the min and max num for a combination of year and dept. Thereafter, left join
on the generated numbers to find the missing ones.
with t1 as (select yr,dept,max(num) maxnum, min(num) minnum
from t
group by yr,dept)
,x as (select yr, dept, minnum, maxnum from t1
union all
select yr, dept, minnum+1, maxnum
from x
where minnum < maxnum
)
select x.yr,x.dept,x.minnum as missing_num
from x
left join t on t.yr=x.yr and t.dept=x.dept and t.num = x.minnum
where t.num is null
order by 1,2,3
Upvotes: 0
Reputation: 1269683
If it is only one value that might be missing, you can do:
select t.year, t.dept, t.num + 1
from t
where t.num < (select max(t2.num) from t t2 where t2.year = t.year and t2.dept = t.dept) and
not exists (select 1
from t t2
where t2.year = t.year and t2.dept = t.dept and
t.num + 1 = t2.num
);
In SQL Server 2012+, this can be simplified to:
select year, dept, num + 1 as num
from (select t.*, lead(num) over (partition by year, dept order by num) as next_num
from t
) t
where next_num <> num + 1; -- Note: this handles the final num where `next_num` is `NULL`
This approach can actually be generalized to find ranges that are missing. Assuming you are using SQL Server 2012+, then:
select year, dept, num + 1 as start_missing, next_num - 1 as end_missing
from (select t.*, lead(num) over (partition by year, dept order by num) as next_num
from t
) t
where next_num <> num + 1; -- Note: this handles the final num where `next_num` is `NULL`
Upvotes: 2