SQL Selecting all but newest result per id

Question

I need to set a "waived" flag in my table for all but the newest result per id. I thought I had a query that will work here, but when I run a select on the query, I'm getting incorrect results - I saw one case where it selected both of the only two results for a particular id. I'm also getting multiple results with the same exact data. What am I doing wrong here?

Here's my select statement:

select t.test_row_id, t.test_result_id, t.waived, t.pass, t.comment
from EV.Test_Result
join EV.Test_Result as t on EV.Test_Result.test_row_id = t.test_row_id and EV.Test_Result.start_time < t.start_time and t.device_id = 1219 and t.waived = 0
order by t.test_row_id

Here's the actual query I want to run:

update EV.Test_Result 
set waived = 1
from EV.Test_Result
join EV.Test_Result as t on EV.Test_Result.test_row_id = t.test_row_id and EV.Test_Result.start_time < t.start_time and t.device_id = 1219 and t.waived = 0

clifton_h · Accepted Answer

If I understand this correctly, you are having problems because the Cardinality of the ON predicate returns all matching rows.

       EV.Test_Result.test_row_id = t.test_row_id 
   and EV.Test_Result.start_time < t.start_time

This ON will compare all of the start_time values that have the same id and return every combination of result sets where start_time is lesser than the t.start_time. Clearly, this is not what you want.

                    and t.device_id = 1219 
                    and t.waived = 0

This is actually a predicate (ON technically is one), but I would prefer to use this in a subquery/CTE for several reasons: You limit the number of rows SQL has to retrieve and compare.

Something like the following might be what you needed:

SELECT A.test_row_id
     , A.test_result_id
     , A.waived
     , A.pass
     , A.comment
FROM EV.Test_Result A
INNER JOIN (SELECT MAX(start_time) AS start_time
                 , test_row_id
            FROM EV.Test_Result
            WHERE device_id = 1219
              AND waived = 0
            GROUP BY test_row_id
                            ) AS T ON A.test_row_id = T.test_row_id
                                  AND A.start_time < T.start_time
ORDER BY A.test_row_id

This query then returns a 1:M relationship between the values in the ON predicate, unlike the M:M query you had run.

UPDATE: Since I sheepishly screwed up trying to alter my Query on SO, I'll redeem myself by explaining the physical and logical orders of basic SQL Query operators:

As you know, you write a simple SELECT statement like the following:

SELECT , SUM() AS Cost
FROM  
WHERE  = 'some_value'
GROUP BY 
HAVING SUM() > some_value
ORDER BY

Note that if you use a aggregate function, all other columns MUST appear in the GROUP BY or another function.

Now, SQL Server requires them to be written in that order although it actually processes this logically by the following order that is worth memorizing:

FROM, WHERE, GROUP BY, HAVING, SELECT, ORDER BY

There are more details found on SELECT - MSDN, but this is why any columns in the SELECT operator must be in the group by or in a aggregate function (SUM, MIN, MAX, etc)...and also why my lazy code failed on your first attempt. :/ Note also that the ORDER BY is last (technically TOP operator occurs after this), and that without it the result is not deterministic unless a function such as DENSE_RANK enforces it (thought this occurs in the SELECT statement).

Hope this helps solve the problem and better yet how SQL works. Cheers

SQL Selecting all but newest result per id

Answers (2)

Related Questions