Mateus Viccari
Mateus Viccari

Reputation: 7699

Find gaps of a sequence in SQL without creating additional tables

I have a table invoices with a field invoice_number. This is what happens when i execute select invoice_number from invoice:

invoice_number
--------------
1
2
3
5
6
10
11

I want a SQL that gives me the following result:

gap_start | gap_end
4         | 4
7         | 9

How can i write a SQL to perform such query? I am using PostgreSQL.

Upvotes: 17

Views: 7108

Answers (3)

user330315
user330315

Reputation:

The name of this problem is the "Gaps and Islands problem" which can be done with any modern SQL, using window functions:

select invoice_number + 1 as gap_start, 
       next_nr - 1 as gap_end
from (
  select invoice_number, 
         lead(invoice_number) over (order by invoice_number) as next_nr
  from invoices
) nr
where invoice_number + 1 <> next_nr;

SQLFiddle: http://sqlfiddle.com/#!15/1e807/1

Walkthrough example here using row_number over partition and interval: Postgres Consecutive Days, gaps and islands, Tabibitosan

Upvotes: 37

tutuDajuju
tutuDajuju

Reputation: 10840

We can use a simpler technique to get all missing values first, by joining on a generated sequence column like so:

select series
from generate_series(1, 11, 1) series
left join invoices on series = invoices.invoice_number
where invoice_number is null;

This gets us the series of missing numbers, which can be useful on it's own in some cases.

To get the gap start/end range, we can instead join the source table with itself.

select invoices.invoice_number + 1 as start, 
       min(fr.invoice_number) - 1 as stop
from invoices
left join invoices r on invoices.invoice_number = r.invoice_number - 1
left join invoices fr on invoices.invoice_number < fr.invoice_number
where r.invoice_number is null
      and fr.invoice_number is not null
group by invoices.invoice_number,
         r.invoice_number;

dbfiddle: https://dbfiddle.uk/?rdbms=postgres_14&fiddle=32c5f3c021b0f1a876305a2bd3afafc9

This is probably less optimised than the above solutions, but could be useful in SQL servers that don't support lead() function perhaps.


Full credit goes to this excellent page in SILOTA docs: http://www.silota.com/docs/recipes/sql-gap-analysis-missing-values-sequence.html

I highly recommend reading it, as it explains the solution step by step.

Upvotes: 6

Pavel Stehule
Pavel Stehule

Reputation: 45760

I found another query:

 select invoice_number + lag gap_start, 
        invoice_number + lead - 1 gap_end
    from (select invoice_number, 
                 invoice_number - lag(invoice_number) over w lag,
                 lead(invoice_number) over w - invoice_number lead 
              from invoices window w as (order by invoice_number)) x 
  where lag = 1 and lead > 1;

Upvotes: 1

Related Questions