User1974
User1974

Reputation: 396

Using a WHERE clause subquery, select the greatest 1 per group with a tie-breaker

I have a RoadInsp table in a SQLite 3.38.2 database. I've put the data in a CTE for purpose of this question:

with roadinsp (objectid, asset_id, date_, condition) as (
values
(1, 1, '2016-04-01', 20),
(2, 1, '2019-03-01', 19),
(3, 1, '2022-01-01', 18),
  
(4, 2, '2016-04-01', 17),
(5, 2, '2022-01-01', 16),
  
(6, 3, '2022-03-01', 15),  --duplicate date
(7, 3, '2022-03-01', 14),  --duplicate date
(8, 3, '2019-01-01', 13)
)  
select * from roadinsp

objectid  asset_id      date_   condition
       1         1  2016-04-01         20
       2         1  2019-03-01         19
       3         1  2022-01-01         18

       4         2  2016-04-01         17
       5         2  2022-01-01         16

       6         3  2022-03-01         15
       7         3  2022-03-01         14
       8         3  2019-01-01         13

demo


I'm using GIS software that only lets me use SQL in a WHERE clause/SQL expression, not a full SELECT query.

I want to select the greatest 1 per group using a WHERE clause. In other words, for each ASSET_ID, I want to select the row that has the latest date.

I can achieve that using a WHERE clause expression like this:

date_ = 
  (select max(subq.date_) from roadinsp subq where roadinsp.asset_id = subq.asset_id)

objectid  asset_id      date_   condition
       3         1  2022-01-01         18
       5         2  2022-01-01         16
       6         3  2022-03-01         15
       7         3  2022-03-01         14

That works, but it selects two rows for asset #3, since there are two rows with the same date for that asset.

So I want to break the tie by selecting the row that has the highest condition value. It would look like this:

objectid  asset_id  date_       condition
       3         1  2022-01-01         18
       5         2  2022-01-01         16
       6         3  2022-03-01         15  --this row has a higher condition value than the other duplicate row.
                                           --so the other duplicate row was omitted.

I only ever want to select one row per asset. So if there are duplicate condition values too, then it doesn't matter what condition is selected, as long as only one row is selected.


Using a WHERE clause subquery, how can I select the greatest 1 per group, and break ties using the row that has the greatest condition?

Upvotes: 3

Views: 209

Answers (3)

MatBailie
MatBailie

Reputation: 86775

I would use IN() with ORDER BY and LIMIT to check the primary key...

SELECT * 
  FROM roadinsp r
 WHERE objectid IN (
         SELECT objectid
           FROM roadinsp
          WHERE asset_id = r.asset_id
       ORDER BY date_ DESC, condition DESC
          LIMIT 1
       )

This is easily extended to n rows per asset. And additional conditions can easily be added to the ORDER BY.

Demo

Upvotes: 3

JHH
JHH

Reputation: 1499

REQUIREMENTS

  • I'm using GIS software that only lets me use SQL in a WHERE clause/SQL expression, not a full SELECT query.
  • Using a WHERE clause subquery, how can I select the greatest n per group, and break ties with the row that has the largest condition?

BUSINESS RULES

  • For each asset_id, pick the one with most recent date_
  • If there are more than one row on the same date_, pick the one with largest condition

SOLUTION

  • Concatenate date_ and condition into one and then pick the max() as below:
  • Add leading 0s depends on max number of digits of condition
select * 
  from roadinsp
 where date_ || substr('00'||condition,-2,2) = 
  (select max(subq.date_ || substr('00'||condition,-2,2)) from roadinsp subq where roadinsp.asset_id = subq.asset_id);

Outcome:

objectid|asset_id|date_     |condition|
--------+--------+----------+---------+
       3|       1|2022-01-01|       18|
       5|       2|2022-01-01|       16|
       6|       3|2022-03-01|       15|

UPDATE:

  • If there're more than one row with the same date_ and condition, add objectid to the list:
select * 
  from roadinsp
 where (date_ || substr('00'||condition,-2,2) || substr('00000000'||objectid,-8,8)) = 
  (select max(subq.date_ || substr('00'||condition,-2,2) || substr('00000000'||objectid,-8,8)) from roadinsp subq where roadinsp.asset_id = subq.asset_id);

Upvotes: -1

Stu
Stu

Reputation: 32614

Perhaps you could add another similar condition to the where clause (as I understand your question, that's all you can modify?):

select * 
from roadinsp r
where 
  date_ = (
    select max(subq.date_) 
    from roadinsp subq 
    where r.asset_id = subq.asset_id
 )
 and condition = (
   select Max(condition) 
   from roadinsp c 
   where c.date_ = r.date_ 
   and c.asset_id = r.asset_id
);

Updated Fiddle

Upvotes: 2

Related Questions