wiltomap
wiltomap

Reputation: 4243

PostgreSQL query including WITH subquery

From the following table (named status), I need to extract town codes having a status of '01' by the end of year 2015. The column status_date stores the date at which a town changed its status.

gid  | town      | status  | status_date
-----+-----------+---------+-------------
1    | 86001     | 00      | 2000-01-01
2    | 86001     | 01      | 2016-03-01
3    | 86002     | 01      | 2000-01-01
4    | 86003     | 00      | 2000-01-01
5    | 86003     | 01      | 2015-03-01
6    | 86003     | 02      | 2015-09-01

I can achieve that with the following query which is a bit long:

WITH tab AS (SELECT town, MAX(status_date) FROM status GROUP BY town)

  SELECT 
    t.town

  FROM tab t
    LEFT JOIN status s ON t.town = s.town AND t.max = s.status_date 

  WHERE t.max < '2016-01-01' AND s.status = '01' ; 

The result is:

town   
-------
86002

Any idea on how to make this query simpler? Is the WITH essential?


To create the table for testing:

CREATE TABLE status (gid serial NOT NULL, town CHARACTER VARYING(5), status CHARACTER VARYING(2), status_date DATE) ;

INSERT INTO status (town, status, status_date) VALUES
  ('86001', '00', '2000-01-01'),
  ('86001', '01', '2016-03-01'),
  ('86002', '01', '2000-01-01'),
  ('86003', '00', '2000-01-01'),
  ('86003', '01', '2015-03-01'),
  ('86003', '02', '2015-09-01') ;

Upvotes: 5

Views: 7825

Answers (3)

Gordon Linoff
Gordon Linoff

Reputation: 1269873

You can do this with distinct on:

select s.*
from (select distinct on (s.town) s.*
      from status s
      where s.status_date < '2016-01-01'
      order by s.town, s.status_date desc
     ) s
where status = '01';

This query will get the latest status for each town up to the end of 2015. The outer query then selects those that are 01.

Upvotes: 3

Bozhidar Stoyneff
Bozhidar Stoyneff

Reputation: 3634

Your approach is correct - the CTE powered queries are more readable and when built correctly they can offer performance benefits.

Instead of selecting the town in your CTE, select the gid column. Then join the original table on it and voila:

WITH tab AS (
    SELECT gid, MAX(status_date) 
    FROM status 
    GROUP BY gid 
    HAVING MAX(status_date) < '2016-01-01'
)
SELECT s.whatever
FROM tab t
    INNER JOIN status s ON t.gid = s.sid
WHERE s.status = '01' 

EDIT

My apologies; I was in a hurry this morning so I wrote the query wrong. Now I had the time to analyze the problem deeper.

If the performance matters, then work it out around the PK like this.

WITH tab AS (
    SELECT MAX(gid) as ID
    FROM [status]
    WHERE YEAR(status_date) = 2015 AND status = '01'
    GROUP BY town
)
SELECT s.*
FROM tab t INNER JOIN status s ON t.ID = s.gid

This only works if the status_date grows along with the gid. Else you'll need to fallback to the original query you've posted and max on the date. However you can/should use INNER JOIN instead of LEFT JOIN:

WITH tab AS (  
    SELECT town, MAX(status_date) as Latest
    FROM [status]
    WHERE YEAR(status_date) = 2015 AND status = '01'
    GROUP BY town
)
SELECT s.*
FROM tab t INNER JOIN [status] s ON t.town = s.town AND t.Latest = s.status_date

Upvotes: 1

Tamas Rev
Tamas Rev

Reputation: 7166

Your subquery contains a GROUP BY expression, whlist the outer query doesn't. So you have to use a subquery.

You could inline that, but that would only make the query more difficult to read. Your query is as simple as it can get.

Upvotes: 0

Related Questions