Zolt
Zolt

Reputation: 2811

Query Optimization with ROW_NUMBER

I have this query:

SELECT 
    PE1.PRODUCT_EQUIPMENT_KEY, -- primary key
    PE1.Customer_Ban, 
    PE1.Subscriber_No, 
    PE1.Prod_Equip_Cd, 
    PE1.Prod_Equip_Txt, 
    PE1.Prod_Equip_Category_Txt--,
    -- PE2.ep_rnk ------------------ UNCOMMENT THIS LINE
FROM 
    INT_ADM.Product_Equipment_Dim PE1
    INNER JOIN 
    ( 
        SELECT 
            PRODUCT_EQUIPMENT_KEY,
            ROW_NUMBER() OVER (PARTITION BY Customer_Ban, Subscriber_No ORDER BY Start_Dt ASC) AS ep_rnk
            FROM INT_ADM.Product_Equipment_Dim PE2
    ) PE2
    ON PE2.PRODUCT_EQUIPMENT_KEY = PE1.PRODUCT_EQUIPMENT_KEY
WHERE 
    Line_Of_Business_Cd = 'M' 
    AND /*v_Date_Start*/ TO_DATE( '2022/01/12', 'yyyy/mm/dd' ) BETWEEN Start_Dt AND End_Dt 
    AND Current_Ind = 'Y' 

If I run it as you see it then it runs in under a second.

If I run it with -- PE2.ep_rnk ------------------ UNCOMMENT THIS LINE uncommented then the query takes up to 5 minutes to complete.

I know it's something to do with ROW_NUMBER() but after looking all over online I can't find a good explanation and solution. Does anyone know why uncommenting that line makes the query so slow, and what I can do about it so it runs fast?

Much appreciate your help in advance.

Upvotes: 0

Views: 51

Answers (1)

Marmite Bomber
Marmite Bomber

Reputation: 21063

The root cause is, that even if the predicate in the where clause allows an efficient access to the rows of the table (but I suspect your below a second response is the time to get the first page of the result), you need in the subquery to access all rows of the table, to window sort them and finaly to join them to the first row source.

So if you comment out the ep_rnk Oracle is smart enought that it do not need to evaluate the subquery at all, because the subquery is on the same table and the join is on the primary key - so no row can be lost or duplicated in the join.

What can you improve?

Not much. If the WHERE condition filters the table very restrictive (and you end with only a small number of PRODUCT_EQUIPMENT_KEY) make the same filer in the subquery:

( 
    SELECT 
        PRODUCT_EQUIPMENT_KEY,
        ROW_NUMBER() OVER (PARTITION BY Customer_Ban, Subscriber_No ORDER BY Start_Dt ASC) AS ep_rnk
        FROM INT_ADM.Product_Equipment_Dim PE2
        --- filer added
        WHERE PRODUCT_EQUIPMENT_KEY in (
            SELECT PRODUCT_EQUIPMENT_KEY
            FROM INT_ADM.Product_Equipment_Dim
            WHERE ... same predicate as in the main query ...
        )
) PE2

If the predicate returns all (or most) of the PRODUCT_EQUIPMENT_KEY the only (often used) way is to pre-calculate the rank e.g. in a materialized view

The materialized view is defined as follows

SELECT 
    PE1.PRODUCT_EQUIPMENT_KEY, -- primary key
    PE1.Customer_Ban, 
    PE1.Subscriber_No, 
    PE1.Prod_Equip_Cd, 
    PE1.Prod_Equip_Txt, 
    PE1.Prod_Equip_Category_Txt--,
    ROW_NUMBER() OVER (PARTITION BY Customer_Ban, Subscriber_No ORDER BY Start_Dt ASC) AS ep_rnk
FROM 
    INT_ADM.Product_Equipment_Dim PE1

and you simple query from it - without a join.

Upvotes: 1

Related Questions