Postgresql: An alternative to subqueries to make the query more efficient?

Question

So I have the following table with the schema:

CREATE TABLE stages (
  id  serial PRIMARY KEY,
  cid VARCHAR(6)  NOT NULL,
  stage varchar(30)  NOT null,
  status varchar(30) not null,
);

with the following test data:

INSERT INTO stages (id, cid, stage, status) VALUES
  ('1', '1', 'first stage', 'accepted'),
  ('2', '1', 'second stage', 'current'),
  ('3', '2', 'first stage', 'accepted'),
  ('4', '3', 'first stage', 'accepted'),
  ('5', '3', 'second stage', 'accepted'),
  ('6', '3', 'third stage', 'current')
  ;

Now the use case is that we want to query this table for each stage for example we will query this table for the 'first stage' and then try to fetch all those cids which do not exist in the subsequent stage for example the 'second stage':

Result Set:

cid | status
2   | 'accepted'

While running the query for the 'second stage', we will try to fetch all those cids that do not exist in the 'third stage' and so on.

Result Set:

cid | status
1   | 'current'

Currently, we do this by making an exists subquery in the where clause which is not very performant.

The question is that is there a better alternative approach to the one we're currently using or do we need to focus on optimizing this current approach only? Also, what further optimizations can we do to make the exists subquery more performant?

Thanks!

Gordon Linoff · Accepted Answer

You can use lead():

select s.*
from (select s.*,
             lead(stage) over (partition by cid order by id) as next_stage
      from stages s
     ) s
where stage = 'first stage' and next_stage is null;

Postgresql: An alternative to subqueries to make the query more efficient?

Answers (2)

Related Questions