Reputation: 447
I am just wondering, if the following two sql statements have any performance differences or are they essentially the same:
To select a person's latest spouse record from spouse
table (person_id, spouse_id, marriage_date).
select *
from spouse
where (person_id, marriage_date) in ( select person_id, max(marriage_date)
from spouse
group by person_id
)
select *
from spouse s1
where marriage_date = ( select max(marriage_date)
from spouse s2
where s1.person_id = s2.person_id
)
This is a common requirement for reporting, for example, employees' latest job, highest education etc etc. I would like to know if you prefer the above statements one way over the other and why, or, if there are other better way (in terms of performance/readability) to go about these get-the-latest/maximum requirement.
Upvotes: 4
Views: 9214
Reputation: 13496
Your examples are syntactically the same. As a rule of thumb, use IN when the subquery is the most restraining (i.e. eliminates a lot of records from the result set) and use a correlated subquery when the outer query is the most restraining (i.e. the subquery has more records than the total result set).
Based on table statistics and estimated cost, the Oracle query optimizer might rewrite an IN clause to a correlated subquery and vice versa. Look at the explain plans for your situation and pick the plan with the least cost.
Upvotes: 0
Reputation: 254926
You can use analytic functions to do that:
SELECT *
FROM (SELECT ROW_NUMBER() OVER (PARTITION BY person_id
ORDER BY marriage_date DESC) AS r,
t.*
FROM spouse t) x
WHERE x.r = 1
And you can compare efficiency of all queries by looking at their execution plans.
Upvotes: 1
Reputation: 21882
As already alluded to in a previous answer, you can use SQL server windowing functions to achieve this quite nicely.
SELECT s1.person_id, s1.marriage_date
FROM spouse s1
JOIN (
SELECT
person_id,
ROW_NUMBER() OVER (PARTITION BY person_id ORDER BY marriage_date DESC) AS Priority
FROM spouse
) s2
ON s2.person_id = s1.person_id AND s2.Priority = 1
In the joined query, we partition the data by person_id
, then apply a ranking function (ROW_NUMBER()
) which assigns a number to each row, in descending order of marriage_date
. The ranking is done independently for each person_id
, so the join condition s2.Priority = 1
means we get only the record with the max marriage_date
for each person.
You might find the following useful:
Upvotes: 4