Reputation: 3162

Select most recent row based on distinct combination of two columns

I'm writing a cronjob that runs analysis on a flags table in my database, structured as such:

| id | item | def | time_flagged | time_resolved | status  |
+----+------+-----+--------------+---------------+---------+
| 1  | 1    | foo | 1519338608   | 1519620669    | MISSED  |
| 2  | 1    | bar | 1519338608   | (NULL)        | OPEN    |
| 3  | 2    | bar | 1519338608   | 1519620669    | IGNORED |
| 4  | 1    | foo | 1519620700   | (NULL)        | OPEN    |

For each distinct def, for each unique price, I want to get the "latest" row (IFNULL(`time_resolved`, `time_flagged`) AS `time`). If no such row exists for a given def-item combination, that's okay; I just don't want any duplicates for a given def-item combination.

For the above data set, I would like to select:

| def | item | time       | status  |
+-----+------+------------+---------+
| foo | 1    | 1519620700 | OPEN    |
| bar | 1    | 1519338608 | OPEN    |
| bar | 2    | 1519620669 | IGNORED |

Row 1 is not included because it's "overridden" by row 4, as both rows have the same def-item combination, and the latter has a more recent time.

The data set will have a few dozen distinct defs, a few hundred distinct items, and a very large number of flags that will only increase over time.

How can I go about doing this? I see the greatest-n-per-group tag is rife with similar questions but I don't see any that involve my specific circumstance of needed "nested grouping" across two columns.

Upvotes: 3

Answers (4)

Aran K

Reputation: 289

select table.def, table.item, a.time, table.status 
from table 
join (select
      def, item, MAX(COALESCE(time_r, time_f)) as time
      from temp
      group by def, item) a 
on temp.def = a.def and
   temp.item = a.item and 
   COALESCE(temp.time_r, temp.time_f) = a.time

Upvotes: 1

ravioli

Reputation: 3833

Depending on your version of mySQL, you can use a window function:

SELECT def, item, time, status
FROM (
  SELECT 
    def, 
    item,
    time,
    status,
    RANK() OVER(PARTITION BY def, item ORDER BY COALESCE(time_resolved, time_flagged) DESC) MyRank  -- Rank each (def, item) combination by "time"
  FROM MyTable
) src
WHERE MyRank = 1 -- Only return top-ranked (i.e. most recent) rows per (def, item) grouping

If you can have a (def, item) combo with the same "time" value, then change RANK() to ROW_NUMBER. This will guarantee you only get one row per grouping.

Upvotes: 1

Gustavo Topete

Reputation: 1306

You could try:

select distinct def, item, IFNULL(time_resolved, time_flagged) AS time, status from flags A where IFNULL(time_resolved, time_flagged) = (select MAX(IFNULL(time_resolved, time_flagged)) from flags B where A.item = B.item and A.def = B.def )

I know it's not the best approach but it might work for you

Upvotes: 2

Golden Ratio

Reputation: 349

Do you mean 'for each unique Def and each unique Item'? If so, a group by of multiple columns seems like it would work (shown as a temp table t) joined back to the original table to grab the rest of the data:

select 
    table.def,
    table.item,
    table.time,
    status
from
table
join (select
    def,
    item,
    max(time) time
from table
group by def, item) t
on 
    table.def=t.def and
    table.item=t.item and
    table.time=t.time

Upvotes: 1

Select most recent row based on distinct combination of two columns

Answers (4)

Related Questions