Leighton Ritchie
Leighton Ritchie

Reputation: 501

SQL Filter to only consecutive numbers

I have a table that's ordered by timestamp, and I only want to keep steps of a consecutive order (marked with * below).
In imperative programming, it would be:

prev_step = 0
output = []
for step in table.steps:  # already sorted by timestamp
  if step == prev_step + 1:
    output.append(step)  # desired row
    prev_step = step

Original table that I have (desired rows decorated with *, not actually data):

| timestamp | step |
| --------- | ---- |
| 100000001 | 5    |
| 100000002 | 1    |*
| 100000003 | 1    | ^
| 100000004 | 2    |*
| 100000005 | 2    | ^
| 100000006 | 4    |
| 100000007 | 5    |
| 100000008 | 3    |*
| 100000009 | 4    |*
| 100000010 | 2    |
| 100000011 | 5    |*
| 100000012 | 7    |

What I want:

| timestamp | step |
| --------- | ---- |
| 100000002 | 1    |*
| 100000004 | 2    |*
| 100000008 | 3    |*
| 100000009 | 4    |*
| 100000011 | 5    |*

I've only managed to come up with a WHERE step - LAG(step) OVER (ORDER BY timestamp) <> 0, but that only removes adjacent duplicates (marked in ^ above). It certainly helps, but isn't quite there yet.

Thanks in advance!

Upvotes: 2

Views: 584

Answers (2)

Tim Biegeleisen
Tim Biegeleisen

Reputation: 522646

Here is a solution which relies on a correlated subquery to detect the correct record to be retained at each step.

WITH cte AS (
    SELECT t1.*, (SELECT COUNT(DISTINCT t2.step) FROM yourTable t2
                 WHERE t2."timestamp" < t1."timestamp" AND t2.step < t1.step) AS cnt
    FROM yourTable t1
),
cte2 AS (
    SELECT t.*, ROW_NUMBER() OVER (PARTITION BY cnt ORDER BY step, "timestamp") rn
    FROM cte t
)

SELECT t1."timestamp", t1.step, t1.rn, t1.cnt
FROM cte2 t1
WHERE rn = 1 AND (step = 1 OR EXISTS (SELECT 1 FROM yourTable t2
                                      WHERE t2.step = t1.step - 1))
ORDER BY "timestamp";

screen capture from demo link below

Demo

Upvotes: 2

Gordon Linoff
Gordon Linoff

Reputation: 1270883

One approach to this is a recursive CTE. Unfortunately, there are limits on recursive CTEs. So, one method is to generate every path through the data by step-order. Then choose the smallest timestamp for each step:

with cte(ts, step) as (
      (select ts, step
      from t
      where step = 1
      order by ts
      fetch first 1 row only)
      union all
      select t.ts, t.step
      from cte join
           t
           on t.ts >= cte.ts and t.step = cte.step + 1
     )
select *
from (select cte.*,
             row_number() over (partition by step order by ts) as seqnum
      from cte
     ) cte
where seqnum = 1;

Here is a db<>fiddle.

Upvotes: 2

Related Questions