Reputation: 1211
Background
I've got this PostgreSQL
join that works pretty well for me:
select m.id,
m.zodiac_sign,
m.favorite_color,
m.state,
c.combined_id
from people."People" m
LEFT JOIN people.person_to_person_composite_crosstable c on m.id = c.id
As you can see, I'm joining two tables to bring in a combined_id
, which I need for later analysis elsewhere.
The Goal
I'd like to write a query that does so by picking the combined_id
that's got the lowest value of m.id
next to it (along with the other variables too). This ought to result in a new table with unique/distinct values of combined_id
.
The Problem
The issue is that the current query returns ~300 records, but I need it to return ~100. Why? Each combined_id
has, on average, 3 different m.id
's. I don't actually care about the m.id
's; I care about getting a unique combined_id
. Because of this, I decided that a good "selection criterion" would be to select rows based on the lowest value m.id
for rows with the same combined_id
.
What I've tried
I've consulted several posts on this and I feel like I'm fairly close. See for instance this one or this one. This other one does exactly what I need (with MAX
instead of MIN
) but he's asking for it in Unix Bash 😞
Here's an example of something I've tried:
select m.id,
m.zodiac_sign,
m.favorite_color,
m.state,
c.combined_id
from people."People" m
LEFT JOIN people.person_to_person_composite_crosstable c on m.id = c.id
WHERE m.id IN (select min(m.id))
This returns the error ERROR: aggregate functions are not allowed in WHERE
.
Any ideas?
Upvotes: 1
Views: 3096
Reputation: 520978
Postgres's DISTINCT ON
is probably the best approach here:
SELECT DISTINCT ON (c.combined_id)
m.id,
m.zodiac_sign,
m.favorite_color,
m.state,
c.combined_id
FROM people."People" m
LEFT JOIN people.person_to_person_composite_crosstable c
ON m.id = c.id
ORDER BY
c.combined_id,
m.id;
As for performance, the following index on the crosstable might speed up the query:
CREATE INDEX idx ON people.person_to_person_composite_crosstable (id, combined_id);
If used, the above index should let the join happen faster. Note that I cover the combined_id
column, which is required by the select.
Upvotes: 3