Reputation: 1655
So I have the following table with the schema:
CREATE TABLE stages (
id serial PRIMARY KEY,
cid VARCHAR(6) NOT NULL,
stage varchar(30) NOT null,
status varchar(30) not null,
);
with the following test data:
INSERT INTO stages (id, cid, stage, status) VALUES
('1', '1', 'first stage', 'accepted'),
('2', '1', 'second stage', 'current'),
('3', '2', 'first stage', 'accepted'),
('4', '3', 'first stage', 'accepted'),
('5', '3', 'second stage', 'accepted'),
('6', '3', 'third stage', 'current')
;
Now the use case is that we want to query this table for each stage for example we will query this table for the 'first stage' and then try to fetch all those cids which do not exist in the subsequent stage for example the 'second stage':
Result Set:
cid | status
2 | 'accepted'
While running the query for the 'second stage', we will try to fetch all those cids that do not exist in the 'third stage' and so on.
Result Set:
cid | status
1 | 'current'
Currently, we do this by making an exists subquery in the where clause which is not very performant.
The question is that is there a better alternative approach to the one we're currently using or do we need to focus on optimizing this current approach only? Also, what further optimizations can we do to make the exists subquery more performant?
Thanks!
Upvotes: 0
Views: 358
Reputation: 4503
CREATE TABLE stages (
id serial PRIMARY KEY
, cid VARCHAR(6) NOT NULL
, stage varchar(30) NOT null
, status varchar(30) not null
, UNIQUE ( cid, stage)
);
INSERT INTO stages (id, cid, stage, status) VALUES
(1, '1', 'first stage', 'accepted'),
(2, '1', 'second stage', 'current'),
(3, '2', 'first stage', 'accepted'),
(4, '3', 'first stage', 'accepted'),
(5, '3', 'second stage', 'accepted'),
(6, '3', 'third stage', 'current')
;
ANALYZE stages;
-- You can fetch all (three) stages with one query
-- Luckily, {'first', 'second', 'third'} are ordered alphabetically ;-)
-- --------------------------------------------------------------
-- EXPLAIN ANALYZE
SELECT * FROM stages q
WHERE NOT EXISTS (
SELECT * FROM stages x
WHERE x.cid = q.cid AND x.stage > q.stage
);
-- Some people dont like EXISTS, or think that it is slow.
-- --------------------------------------------------------------
-- EXPLAIN ANALYZE
SELECT q.*
FROM stages q
JOIN (
SELECT id
, row_number() OVER (PARTITION BY cid ORDER BY stage DESC) AS rn
FROM stages x
)x ON x.id = q.id AND x.rn = 1;
Upvotes: 1
Reputation: 1269703
You can use lead()
:
select s.*
from (select s.*,
lead(stage) over (partition by cid order by id) as next_stage
from stages s
) s
where stage = 'first stage' and next_stage is null;
Upvotes: 1