In PostgreSQL, return rows with unique values of one column based on the minimum value of another

Question

Background

I've got this PostgreSQL join that works pretty well for me:

select  m.id,
        m.zodiac_sign,
        m.favorite_color,
        m.state,
        c.combined_id
from people."People" m
LEFT JOIN people.person_to_person_composite_crosstable c on m.id = c.id

As you can see, I'm joining two tables to bring in a combined_id, which I need for later analysis elsewhere.

The Goal

I'd like to write a query that does so by picking the combined_id that's got the lowest value of m.id next to it (along with the other variables too). This ought to result in a new table with unique/distinct values of combined_id.

The Problem

The issue is that the current query returns ~300 records, but I need it to return ~100. Why? Each combined_id has, on average, 3 different m.id's. I don't actually care about the m.id's; I care about getting a unique combined_id. Because of this, I decided that a good "selection criterion" would be to select rows based on the lowest value m.id for rows with the same combined_id.

What I've tried

I've consulted several posts on this and I feel like I'm fairly close. See for instance this one or this one. This other one does exactly what I need (with MAX instead of MIN) but he's asking for it in Unix Bash 😞

Here's an example of something I've tried:

select  m.id,
        m.zodiac_sign,
        m.favorite_color,
        m.state,
        c.combined_id
from people."People" m
LEFT JOIN people.person_to_person_composite_crosstable c on m.id = c.id
WHERE m.id IN (select min(m.id))

This returns the error ERROR: aggregate functions are not allowed in WHERE.

Any ideas?

Tim Biegeleisen · Accepted Answer

Postgres's DISTINCT ON is probably the best approach here:

SELECT DISTINCT ON (c.combined_id)
    m.id,
    m.zodiac_sign,
    m.favorite_color,
    m.state,
    c.combined_id
FROM people."People" m
LEFT JOIN people.person_to_person_composite_crosstable c
    ON m.id = c.id
ORDER BY
    c.combined_id,
    m.id;

As for performance, the following index on the crosstable might speed up the query:

CREATE INDEX idx ON people.person_to_person_composite_crosstable (id, combined_id);

If used, the above index should let the join happen faster. Note that I cover the combined_id column, which is required by the select.

In PostgreSQL, return rows with unique values of one column based on the minimum value of another

Answers (1)

Related Questions