alchn
alchn

Reputation: 447

Select the latest records per person

I am just wondering, if the following two sql statements have any performance differences or are they essentially the same:

To select a person's latest spouse record from spouse table (person_id, spouse_id, marriage_date).

select *
from spouse 
where (person_id, marriage_date) in ( select person_id, max(marriage_date) 
                                      from spouse  
                                      group by person_id
                                     )

select *
from spouse s1
where marriage_date = ( select max(marriage_date) 
                        from spouse s2  
                        where s1.person_id = s2.person_id  
                      )

This is a common requirement for reporting, for example, employees' latest job, highest education etc etc. I would like to know if you prefer the above statements one way over the other and why, or, if there are other better way (in terms of performance/readability) to go about these get-the-latest/maximum requirement.

Upvotes: 4

Views: 9214

Answers (3)

Martin Schapendonk
Martin Schapendonk

Reputation: 13496

Your examples are syntactically the same. As a rule of thumb, use IN when the subquery is the most restraining (i.e. eliminates a lot of records from the result set) and use a correlated subquery when the outer query is the most restraining (i.e. the subquery has more records than the total result set).

Based on table statistics and estimated cost, the Oracle query optimizer might rewrite an IN clause to a correlated subquery and vice versa. Look at the explain plans for your situation and pick the plan with the least cost.

Upvotes: 0

zerkms
zerkms

Reputation: 254926

You can use analytic functions to do that:

SELECT *
  FROM (SELECT ROW_NUMBER() OVER (PARTITION BY person_id
                                   ORDER BY marriage_date DESC) AS r,
               t.*
          FROM spouse t) x
 WHERE x.r = 1

And you can compare efficiency of all queries by looking at their execution plans.

Upvotes: 1

Winston Smith
Winston Smith

Reputation: 21882

As already alluded to in a previous answer, you can use SQL server windowing functions to achieve this quite nicely.

SELECT s1.person_id, s1.marriage_date
FROM spouse s1
JOIN ( 
    SELECT 
        person_id, 
        ROW_NUMBER() OVER (PARTITION BY person_id ORDER BY marriage_date DESC) AS Priority
        FROM spouse
    ) s2
ON s2.person_id = s1.person_id AND s2.Priority = 1

In the joined query, we partition the data by person_id, then apply a ranking function (ROW_NUMBER()) which assigns a number to each row, in descending order of marriage_date. The ranking is done independently for each person_id, so the join condition s2.Priority = 1 means we get only the record with the max marriage_date for each person.

You might find the following useful:

Upvotes: 4

Related Questions