Manngo
Manngo

Reputation: 16291

SELECT fields from one table with aggregates from related table

Here is a simplified description of 2 tables:

CREATE TABLE jobs(id PRIMARY KEY, description);
CREATE TABLE dates(id PRIMARY KEY, job REFERENCES jobs(id), date);

There may be one or more dates per job.

I would like create a query which generates the following (in pidgin):

jobs.id, jobs.description, min(dates.date) as start, max(dates.date) as finish

I have tried something like this:

SELECT id, description,
      (SELECT min(date) as start  FROM dates d WHERE d.job=j.id),
      (SELECT max(date) as finish FROM dates d WHERE d.job=j.id)
FROM jobs j;

which works, but looks very inefficient.

I have tried an INNER JOIN, but can’t see how to join jobs with a suitable aggregate query on dates.

Can anybody suggest a clean efficient way to do this?

Upvotes: 0

Views: 214

Answers (2)

Patrick
Patrick

Reputation: 32199

In addition to Erwin's solution, you can also use a window clause:

SELECT j.id, j.description,
       first_value(d.date) OVER w AS start,
       last_value(d.date) OVER w AS finish
FROM jobs j
JOIN dates d ON d.job = j.id
WINDOW w AS (PARTITION BY j.id ORDER BY d.date
            ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING);

Window functions effectively group by one or more columns (the PARTITION BY clause) and/or ORDER BY some other columns and then you can apply some window function to it, or even a regular aggregate function, without affecting grouping or ordering of any other columns (description in your case). It requires a somewhat different way of constructing queries, but once you get the idea it is pretty brilliant.

In your case you need to get the first value of a partition, which is easy because it is accessible by default. You also need to look beyond the window frame (which ends by default with the current row) to the last value in the partition and then you need the ROWS clause. Since you produce two columns using the same window definition, the WINDOW clause is used here; in case it applies to a single column you can just write the window function in the select list followed by the OVER clause and the window definition without its name (WINDOW w AS (...)).

Upvotes: 1

Erwin Brandstetter
Erwin Brandstetter

Reputation: 656706

While retrieving all rows: aggregate first, join later:

SELECT id, j.description, d.start, d.finish
FROM   jobs j
LEFT   JOIN (
   SELECT job AS id, min(date) AS start, max(date) AS finish 
   FROM   dates 
   GROUP  BY job
   ) d USING (id);

Related:

About JOIN .. USING

It's not a "different type of join". USING (col) is a standard SQL (!) syntax shortcut for ON a.col = b.col. More precisely, quoting the manual:

The USING clause is a shorthand that allows you to take advantage of the specific situation where both sides of the join use the same name for the joining column(s). It takes a comma-separated list of the shared column names and forms a join condition that includes an equality comparison for each one. For example, joining T1 and T2 with USING (a, b) produces the join condition ON *T1*.a = *T2*.a AND *T1*.b = *T2*.b.

Furthermore, the output of JOIN USING suppresses redundant columns: there is no need to print both of the matched columns, since they must have equal values. While JOIN ON produces all columns from T1 followed by all columns from T2, JOIN USING produces one output column for each of the listed column pairs (in the listed order), followed by any remaining columns from T1, followed by any remaining columns from T2.

It's particularly convenient that you can write SELECT * FROM ... and joining columns are only listed once.

Upvotes: 1

Related Questions