OLAP functions versus self join to get record(s) with highest value

Question

There's a common situation where we need to look in a single table, and get the record for a single key where another field has the highest value, latest date, or whatever. We can do this using either a subquery selecting the key / max value, then joining that back to the original table, OR we can do it using an OLAP function (RANK() or ROW_NUMBER() as appropriate).

I'm tuning a query which is doing this for a large dataset (40M records). Oracle's EXPLAIN doesn't always reflect the true performance so I'm hesitant to rely on that.

A simplistic example is below. I want to get the order number and miscellaneous other fields for every record where the order date matches the greatest one for that order.

    -- Self-join version:
    with ords as (
            select 'AAA1' as ord_no, to_date('2023-01-01','yyyy-mm-dd') as ord_date, 'A' field1, 'B' field2 from dual union all
            select 'AAA1' as ord_no, to_date('2023-02-01','yyyy-mm-dd') as ord_date, 'C' field1, 'D' field2 from dual union all
            select 'AAA1' as ord_no, to_date('2023-03-01','yyyy-mm-dd') as ord_date, 'E' field1, 'F' field2 from dual union all
            select 'AAA1' as ord_no, to_date('2023-03-01','yyyy-mm-dd') as ord_date, 'E1' field1, 'F1' field2 from dual union all
            select 'BBB1' as ord_no, to_date('2023-01-01','yyyy-mm-dd') as ord_date, 'G' field1, 'H' field2 from dual union all
            select 'BBB1' as ord_no, to_date('2023-02-01','yyyy-mm-dd') as ord_date, 'I' field1, 'J' field2 from dual union all
            select 'BBB1' as ord_no, to_date('2023-03-01','yyyy-mm-dd') as ord_date, 'K' field1, 'L' field2 from dual
        ),
        max_ord as (
            select ord_no, max(ord_date) max_ord_date from ords group by ord_no
        )
        select mo.ord_no, mo.max_ord_date, o.field1, o.field2
        from max_ord mo join ords o on mo.ord_no = o.ord_no and mo.max_ord_date = o.ord_Date order by mo.ord_no;

    -- OLAP version
    -- same data setup as above, omitted for brevity
select ord_no, ord_date, field1, field2 from 
    (select ord_no, ord_date, field1, field2, rank() over (partition by ord_no order by ord_date desc) maxdt
        from ords
    )
where maxdt = 1 order by ord_no;

In both cases, the expected result would be something like this:

"ORD_NO"  "ORD_DATE"  "FIELD1"  "FIELD2"
"AAA1"    3/1/2023    "E"       "F"
"AAA1"    3/1/2023    "E1"      "F1"
"BBB1"    3/1/2023    "K"       "L"

Obviously, either approach works; I'm just having trouble finding any good cites as to which (in general) would be more efficient. We're on Oracle; obviously other databases may perform quite differently.

The OLAP version seems, to me, to be a little clearer to read - makes it more obvious what you're doing.

Any guidance or personal experiences would be welcome!

OLAP functions versus self join to get record(s) with highest value

Answers (1)

Related Questions