Johonn
Johonn

Reputation: 149

SQL Selecting all but newest result per id

I need to set a "waived" flag in my table for all but the newest result per id. I thought I had a query that will work here, but when I run a select on the query, I'm getting incorrect results - I saw one case where it selected both of the only two results for a particular id. I'm also getting multiple results with the same exact data. What am I doing wrong here?

Here's my select statement:

select t.test_row_id, t.test_result_id, t.waived, t.pass, t.comment
from EV.Test_Result
join EV.Test_Result as t on EV.Test_Result.test_row_id = t.test_row_id and EV.Test_Result.start_time < t.start_time and t.device_id = 1219 and t.waived = 0
order by t.test_row_id

Here's the actual query I want to run:

update EV.Test_Result 
set waived = 1
from EV.Test_Result
join EV.Test_Result as t on EV.Test_Result.test_row_id = t.test_row_id and EV.Test_Result.start_time < t.start_time and t.device_id = 1219 and t.waived = 0

Upvotes: 0

Views: 94

Answers (2)

clifton_h
clifton_h

Reputation: 1298

If I understand this correctly, you are having problems because the Cardinality of the ON predicate returns all matching rows.

       EV.Test_Result.test_row_id = t.test_row_id 
   and EV.Test_Result.start_time < t.start_time

This ON will compare all of the start_time values that have the same id and return every combination of result sets where start_time is lesser than the t.start_time. Clearly, this is not what you want.

                    and t.device_id = 1219 
                    and t.waived = 0

This is actually a predicate (ON technically is one), but I would prefer to use this in a subquery/CTE for several reasons: You limit the number of rows SQL has to retrieve and compare.

Something like the following might be what you needed:

SELECT A.test_row_id
     , A.test_result_id
     , A.waived
     , A.pass
     , A.comment
FROM EV.Test_Result A
INNER JOIN (SELECT MAX(start_time) AS start_time
                 , test_row_id
            FROM EV.Test_Result
            WHERE device_id = 1219
              AND waived = 0
            GROUP BY test_row_id
                            ) AS T ON A.test_row_id = T.test_row_id
                                  AND A.start_time < T.start_time
ORDER BY A.test_row_id

This query then returns a 1:M relationship between the values in the ON predicate, unlike the M:M query you had run.

UPDATE: Since I sheepishly screwed up trying to alter my Query on SO, I'll redeem myself by explaining the physical and logical orders of basic SQL Query operators:

As you know, you write a simple SELECT statement like the following:

SELECT <aggregate column>, SUM(<non-aggregate column>) AS Cost
FROM  <table_name>
WHERE <column> = 'some_value'
GROUP BY <aggregate column>
HAVING SUM(<non-aggregate column>) > some_value
ORDER BY <column>

Note that if you use a aggregate function, all other columns MUST appear in the GROUP BY or another function.

Now, SQL Server requires them to be written in that order although it actually processes this logically by the following order that is worth memorizing:

  • FROM, WHERE, GROUP BY, HAVING, SELECT, ORDER BY

There are more details found on SELECT - MSDN, but this is why any columns in the SELECT operator must be in the group by or in a aggregate function (SUM, MIN, MAX, etc)...and also why my lazy code failed on your first attempt. :/ Note also that the ORDER BY is last (technically TOP operator occurs after this), and that without it the result is not deterministic unless a function such as DENSE_RANK enforces it (thought this occurs in the SELECT statement).

Hope this helps solve the problem and better yet how SQL works. Cheers

Upvotes: 1

Leo
Leo

Reputation: 898

Can you try ROW_NUMBER () function order by timestamp descending and filtering out values having ROW_NUMBER 1 ;

Below query should fetch all records per id except the latest one

I tried below query in Oracle with a table having fields : id,user_id, record_order adn timestamp and it worked :

   select 
          <table_name_alias>.* 
        from 
        (
     select 
    id,
    user_id, 
        row_number() over (partition by id order by record_order desc) as record_number 
    from 
    <your_table_name>
 ) <table_name_alias> 
        where 
    record_number <>1; 

If you are using Teradata DB, you can also try QUALIFY statement. I'm not sure if all DBs support this.

Select 
     table_name.* 
    from table_name 
    QUALIFY row_number() over (partition by id order by record_order desc) <>1;

Upvotes: 0

Related Questions