Reputation: 4243
From the following table (named status
), I need to extract town codes having a status of '01'
by the end of year 2015. The column status_date
stores the date at which a town changed its status.
gid | town | status | status_date
-----+-----------+---------+-------------
1 | 86001 | 00 | 2000-01-01
2 | 86001 | 01 | 2016-03-01
3 | 86002 | 01 | 2000-01-01
4 | 86003 | 00 | 2000-01-01
5 | 86003 | 01 | 2015-03-01
6 | 86003 | 02 | 2015-09-01
I can achieve that with the following query which is a bit long:
WITH tab AS (SELECT town, MAX(status_date) FROM status GROUP BY town)
SELECT
t.town
FROM tab t
LEFT JOIN status s ON t.town = s.town AND t.max = s.status_date
WHERE t.max < '2016-01-01' AND s.status = '01' ;
The result is:
town
-------
86002
Any idea on how to make this query simpler? Is the WITH
essential?
To create the table for testing:
CREATE TABLE status (gid serial NOT NULL, town CHARACTER VARYING(5), status CHARACTER VARYING(2), status_date DATE) ;
INSERT INTO status (town, status, status_date) VALUES
('86001', '00', '2000-01-01'),
('86001', '01', '2016-03-01'),
('86002', '01', '2000-01-01'),
('86003', '00', '2000-01-01'),
('86003', '01', '2015-03-01'),
('86003', '02', '2015-09-01') ;
Upvotes: 5
Views: 7825
Reputation: 1269873
You can do this with distinct on
:
select s.*
from (select distinct on (s.town) s.*
from status s
where s.status_date < '2016-01-01'
order by s.town, s.status_date desc
) s
where status = '01';
This query will get the latest status for each town up to the end of 2015. The outer query then selects those that are 01
.
Upvotes: 3
Reputation: 3634
Your approach is correct - the CTE powered queries are more readable and when built correctly they can offer performance benefits.
Instead of selecting the town in your CTE, select the gid
column. Then join the original table on it and voila:
WITH tab AS (
SELECT gid, MAX(status_date)
FROM status
GROUP BY gid
HAVING MAX(status_date) < '2016-01-01'
)
SELECT s.whatever
FROM tab t
INNER JOIN status s ON t.gid = s.sid
WHERE s.status = '01'
My apologies; I was in a hurry this morning so I wrote the query wrong. Now I had the time to analyze the problem deeper.
If the performance matters, then work it out around the PK like this.
WITH tab AS (
SELECT MAX(gid) as ID
FROM [status]
WHERE YEAR(status_date) = 2015 AND status = '01'
GROUP BY town
)
SELECT s.*
FROM tab t INNER JOIN status s ON t.ID = s.gid
This only works if the status_date
grows along with the gid
. Else you'll need to fallback to the original query you've posted and max on the date. However you can/should use INNER JOIN
instead of LEFT JOIN
:
WITH tab AS (
SELECT town, MAX(status_date) as Latest
FROM [status]
WHERE YEAR(status_date) = 2015 AND status = '01'
GROUP BY town
)
SELECT s.*
FROM tab t INNER JOIN [status] s ON t.town = s.town AND t.Latest = s.status_date
Upvotes: 1
Reputation: 7166
Your subquery contains a GROUP BY
expression, whlist the outer query doesn't. So you have to use a subquery.
You could inline that, but that would only make the query more difficult to read. Your query is as simple as it can get.
Upvotes: 0