Reputation: 501
I have a table that's ordered by timestamp
, and I only want to keep step
s of a consecutive order (marked with *
below).
In imperative programming, it would be:
prev_step = 0
output = []
for step in table.steps: # already sorted by timestamp
if step == prev_step + 1:
output.append(step) # desired row
prev_step = step
Original table that I have (desired rows decorated with *
, not actually data):
| timestamp | step |
| --------- | ---- |
| 100000001 | 5 |
| 100000002 | 1 |*
| 100000003 | 1 | ^
| 100000004 | 2 |*
| 100000005 | 2 | ^
| 100000006 | 4 |
| 100000007 | 5 |
| 100000008 | 3 |*
| 100000009 | 4 |*
| 100000010 | 2 |
| 100000011 | 5 |*
| 100000012 | 7 |
What I want:
| timestamp | step |
| --------- | ---- |
| 100000002 | 1 |*
| 100000004 | 2 |*
| 100000008 | 3 |*
| 100000009 | 4 |*
| 100000011 | 5 |*
I've only managed to come up with a WHERE step - LAG(step) OVER (ORDER BY timestamp) <> 0
, but that only removes adjacent duplicates (marked in ^
above). It certainly helps, but isn't quite there yet.
Thanks in advance!
Upvotes: 2
Views: 584
Reputation: 522646
Here is a solution which relies on a correlated subquery to detect the correct record to be retained at each step.
WITH cte AS (
SELECT t1.*, (SELECT COUNT(DISTINCT t2.step) FROM yourTable t2
WHERE t2."timestamp" < t1."timestamp" AND t2.step < t1.step) AS cnt
FROM yourTable t1
),
cte2 AS (
SELECT t.*, ROW_NUMBER() OVER (PARTITION BY cnt ORDER BY step, "timestamp") rn
FROM cte t
)
SELECT t1."timestamp", t1.step, t1.rn, t1.cnt
FROM cte2 t1
WHERE rn = 1 AND (step = 1 OR EXISTS (SELECT 1 FROM yourTable t2
WHERE t2.step = t1.step - 1))
ORDER BY "timestamp";
Upvotes: 2
Reputation: 1270883
One approach to this is a recursive CTE. Unfortunately, there are limits on recursive CTEs. So, one method is to generate every path through the data by step-order. Then choose the smallest timestamp for each step:
with cte(ts, step) as (
(select ts, step
from t
where step = 1
order by ts
fetch first 1 row only)
union all
select t.ts, t.step
from cte join
t
on t.ts >= cte.ts and t.step = cte.step + 1
)
select *
from (select cte.*,
row_number() over (partition by step order by ts) as seqnum
from cte
) cte
where seqnum = 1;
Here is a db<>fiddle.
Upvotes: 2