Reputation: 8240
Let's say a patient makes many visits. I want to write a query that returns distinct patient rows based on their earliest visit. For example, consider the following rows.
patients
-------------
id name
1 Bob
2 Jim
3 Mary
visits
-------------
id patient_id visit_date reference_number
1 1 6/29/14 09f3be26
2 1 7/8/14 34c23a9e
3 2 7/10/14 448dd90a
What I want to see returned by the query is:
id name first_visit_date reference_number
1 Bob 6/29/14 09f3be26
2 Jim 7/10/14 448dd90a
What I've tried looks something like:
SELECT
patients.id,
patients.name,
visits.visit_date AS first_visit_date,
visits.reference_number
FROM
patients
INNER JOIN (
SELECT
*
FROM
visits
ORDER BY
visit_date
LIMIT
1
) visits ON
visits.patient_id = patients.id
Adding the LIMIT
causes the query to return 0 rows, but removing it causes the query to return duplicates. What's the trick here? I've also tried selecting MIN(visit_date)
in the INNER JOIN but that's also returning dups.
Update
It's been suggested that this question is a duplicate, but to me it seems different because I'm doing this over two separate tables. The accepted answer on the other question suggests joining on y.max_total = x.total
, which works if the table being joined is the same table being selected from. Additionally, I need to return other columns from the row with the MIN date, not just the date itself.
The answer I've accepted works great, however.
Upvotes: 1
Views: 5035
Reputation: 44240
Avoiding the DISTINCT ON(p.id)
, but using a plain old NOT EXISTS(...)
instead
SELECT p.id, p.name
, v.first_visit_date, v.reference_number
FROM patients p
JOIN visits v ON p.id = v.patient_id
-- exclude all join-products that are not the first for a patient.
WHERE NOT EXISTS (
SELECT *
FROM visits nx
WHERE nx.patient_id = v.patient_id
AND ( nx.visit_date < v.visit_date
OR (nx.visit_date = v.visit_date AND nx.id < v.id) -- tie-breaker condition
)
);
Upvotes: 3
Reputation: 7880
Although you are using PostgreSQL, just in case it might be useful or "inspiring", here's the T-SQL version.
SELECT p.id, name, first_visit.visit_date as first_visit_date, v.reference_number as first_visit_reference_number
FROM patients p
INNER JOIN
(
SELECT patient_id, MIN(visit_date) AS visit_date
FROM visits
GROUP BY patient_id
) first_visit ON first_visit.patient_id = p.id
INNER JOIN visits v ON v.patient_id = p.id AND v.visit_date = first_visit.visit_date
Upvotes: 0
Reputation: 180113
You want an aggregate query to reduce multiple rows of the visits
table to one row per patient ID. LIMIT is a bit messy because it's not strictly relational, though you could probably make it work if you were sufficiently determined. If the query optimizer is decent enough then there should be no performance difference between a version using LIMIT (appropriately) and this:
SELECT
patients.id,
patients.name,
visits.first_visit_date AS first_visit_date,
visits.reference_number
FROM
patients
INNER JOIN (
SELECT
patient_id,
MIN(visit_date) as first_visit_date
FROM
visits
GROUP BY
patient_id
) visits ON
visits.patient_id = patients.id
I prefer the version with MIN() because it is clearer to me. Also, if it's clearer to me, then it stands a better chance of being clearer to the query optimizer, too.
Upvotes: 0
Reputation: 125214
Use distinct on
select distinct on (p.id)
p.id,
p.name,
v.visit_date as first_visit_date,
v.reference_number
from
patients p
inner join
visits v on p.id = v.patient_id
order by p.id, v.visit_date
http://www.postgresql.org/docs/current/static/sql-select.html#SQL-DISTINCT
Upvotes: 1