Reputation: 16291
Here is a simplified description of 2 tables:
CREATE TABLE jobs(id PRIMARY KEY, description);
CREATE TABLE dates(id PRIMARY KEY, job REFERENCES jobs(id), date);
There may be one or more dates per job.
I would like create a query which generates the following (in pidgin):
jobs.id, jobs.description, min(dates.date) as start, max(dates.date) as finish
I have tried something like this:
SELECT id, description,
(SELECT min(date) as start FROM dates d WHERE d.job=j.id),
(SELECT max(date) as finish FROM dates d WHERE d.job=j.id)
FROM jobs j;
which works, but looks very inefficient.
I have tried an INNER JOIN
, but can’t see how to join jobs
with a suitable aggregate query on dates
.
Can anybody suggest a clean efficient way to do this?
Upvotes: 0
Views: 214
Reputation: 32199
In addition to Erwin's solution, you can also use a window clause:
SELECT j.id, j.description,
first_value(d.date) OVER w AS start,
last_value(d.date) OVER w AS finish
FROM jobs j
JOIN dates d ON d.job = j.id
WINDOW w AS (PARTITION BY j.id ORDER BY d.date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING);
Window functions effectively group by one or more columns (the PARTITION BY
clause) and/or ORDER BY
some other columns and then you can apply some window function to it, or even a regular aggregate function, without affecting grouping or ordering of any other columns (description
in your case). It requires a somewhat different way of constructing queries, but once you get the idea it is pretty brilliant.
In your case you need to get the first value of a partition, which is easy because it is accessible by default. You also need to look beyond the window frame (which ends by default with the current row) to the last value in the partition and then you need the ROWS
clause. Since you produce two columns using the same window definition, the WINDOW
clause is used here; in case it applies to a single column you can just write the window function in the select list followed by the OVER
clause and the window definition without its name (WINDOW w AS (...)
).
Upvotes: 1
Reputation: 656706
While retrieving all rows: aggregate first, join later:
SELECT id, j.description, d.start, d.finish
FROM jobs j
LEFT JOIN (
SELECT job AS id, min(date) AS start, max(date) AS finish
FROM dates
GROUP BY job
) d USING (id);
Related:
JOIN .. USING
It's not a "different type of join". USING (col)
is a standard SQL (!) syntax shortcut for ON a.col = b.col
. More precisely, quoting the manual:
The
USING
clause is a shorthand that allows you to take advantage of the specific situation where both sides of the join use the same name for the joining column(s). It takes a comma-separated list of the shared column names and forms a join condition that includes an equality comparison for each one. For example, joiningT1
andT2
withUSING (a, b)
produces the join conditionON *T1*.a = *T2*.a AND *T1*.b = *T2*.b
.Furthermore, the output of
JOIN USING
suppresses redundant columns: there is no need to print both of the matched columns, since they must have equal values. WhileJOIN ON
produces all columns fromT1
followed by all columns fromT2
,JOIN USING
produces one output column for each of the listed column pairs (in the listed order), followed by any remaining columns fromT1
, followed by any remaining columns fromT2
.
It's particularly convenient that you can write SELECT * FROM ...
and joining columns are only listed once.
Upvotes: 1