In one to many relationship, return distinct rows based on MIN value

Let's say a patient makes many visits. I want to write a query that returns distinct patient rows based on their earliest visit. For example, consider the following rows.

patients
-------------
id    name
1     Bob
2     Jim
3     Mary

visits
-------------
id    patient_id    visit_date    reference_number
1     1             6/29/14       09f3be26
2     1             7/8/14        34c23a9e
3     2             7/10/14       448dd90a

What I want to see returned by the query is:

id    name    first_visit_date    reference_number
1     Bob     6/29/14             09f3be26
2     Jim     7/10/14             448dd90a

What I've tried looks something like:

SELECT
  patients.id, 
  patients.name, 
  visits.visit_date AS first_visit_date, 
  visits.reference_number
FROM
  patients
INNER JOIN (
  SELECT
    *
  FROM
    visits
  ORDER BY
    visit_date
  LIMIT
    1
) visits ON
  visits.patient_id = patients.id

Adding the LIMIT causes the query to return 0 rows, but removing it causes the query to return duplicates. What's the trick here? I've also tried selecting MIN(visit_date) in the INNER JOIN but that's also returning dups.

Update

It's been suggested that this question is a duplicate, but to me it seems different because I'm doing this over two separate tables. The accepted answer on the other question suggests joining on y.max_total = x.total, which works if the table being joined is the same table being selected from. Additionally, I need to return other columns from the row with the MIN date, not just the date itself.

The answer I've accepted works great, however.

Upvotes: 1

Answers (4)

wildplasser

Reputation: 44240

Avoiding the DISTINCT ON(p.id), but using a plain old NOT EXISTS(...) instead

SELECT p.id, p.name
     , v.first_visit_date, v.reference_number
FROM patients p
JOIN visits v ON p.id = v.patient_id
    -- exclude all join-products that are not the first for a patient.
WHERE NOT EXISTS (
   SELECT *
   FROM visits nx
   WHERE nx.patient_id = v.patient_id
   AND ( nx.visit_date < v.visit_date
       OR (nx.visit_date = v.visit_date AND nx.id < v.id) -- tie-breaker condition
       )
   );

Upvotes: 3

Andrew

Reputation: 7880

Although you are using PostgreSQL, just in case it might be useful or "inspiring", here's the T-SQL version.

SELECT p.id, name, first_visit.visit_date as first_visit_date, v.reference_number as first_visit_reference_number
FROM patients p
INNER JOIN 
(
    SELECT patient_id, MIN(visit_date) AS visit_date
    FROM visits
    GROUP BY patient_id
) first_visit ON first_visit.patient_id = p.id
INNER JOIN visits v ON v.patient_id = p.id AND v.visit_date = first_visit.visit_date

Upvotes: 0

John Bollinger

Reputation: 180113

You want an aggregate query to reduce multiple rows of the visits table to one row per patient ID. LIMIT is a bit messy because it's not strictly relational, though you could probably make it work if you were sufficiently determined. If the query optimizer is decent enough then there should be no performance difference between a version using LIMIT (appropriately) and this:

SELECT
  patients.id, 
  patients.name, 
  visits.first_visit_date AS first_visit_date, 
  visits.reference_number
FROM
  patients
INNER JOIN (
  SELECT
    patient_id,
    MIN(visit_date) as first_visit_date
  FROM
    visits
  GROUP BY
    patient_id
) visits ON
  visits.patient_id = patients.id

I prefer the version with MIN() because it is clearer to me. Also, if it's clearer to me, then it stands a better chance of being clearer to the query optimizer, too.

Upvotes: 0

Clodoaldo Neto

Reputation: 125214

Use distinct on

select distinct on (p.id)
    p.id, 
    p.name, 
    v.visit_date as first_visit_date, 
    v.reference_number
from
    patients p
    inner join
    visits v on p.id = v.patient_id
order by p.id, v.visit_date

http://www.postgresql.org/docs/current/static/sql-select.html#SQL-DISTINCT

Upvotes: 1

In one to many relationship, return distinct rows based on MIN value

Answers (4)

Related Questions