John Bustos
John Bustos

Reputation: 19544

Oracle - Query for large dataset taking very long time - Is there a way to optimize it?

Suppose I had a very large table in my Oracle DB filled with data for thousands of items. This data would get updated very frequently throughout the day and each update gets a timestamp.

So, for example, the table looks as follows (I know column names are bad, this is only illustration):

TBLDaily:

Date:         ItemNo:     CharA:  ....  CharN:    Time_Stamp:
2014/02/15    123         ....                    2014/02/15 10:00AM
2014/02/15    123         ....                    2014/02/15 11:00AM
2014/02/15    123         ....                    2014/02/15 02:13PM
2014/02/15    234         ....                    2014/02/20 01:00PM
2014/02/15    234         ....                    2014/02/20 09:00PM
   ...
2014/02/16    123         ....                    2014/02/20 08:15PM
   ...

Then, I have a table with the same item numbers that stores other information, but it stays static throughout the month, so it would look as follows:

TBLMonthly:

Date:          ItemNo:    CharA:   .... CharK:
2014/01/31     123        ....          
2014/01/31     234        ....          
2013/12/31     123        ....          
2013/12/31     234        ....          
  ...

Now, I need to get, for each part number & for each date, the latest information available from the Daily table and, for some characteristics, if they don't exist there, then grab them from the monthly table.

My SQL query looks as follows:

WITH All_Data AS
(
  SELECT 
    ROW_NUMBER() OVER(PARTITION BY A.Date, A.ItemNo ORDER BY A.Time_Stamp) AS RN,
    A.Date, A.ItemNo, 
    NVL(A.CharA, B.CharA),
    B.CharB,
    ... whatever other characteristics ...

  FROM 
    TBLDaily A,
    TBLMonthly B,

  WHERE
    A.ItemNo = B.ItemNo
  AND
    A.Date BETWEEN To_Date('2012-12-31', 'yyyy-MM-dd') AND To_Date('2014-02-24', 'yyyy-MM-dd') 
  AND
    B.Date = (SELECT max(Date) FROM TBLMonthly WHERE Date <= A.Date)
)

SELECT * 
FROM All_Data 
WHERE RN = 1
ORDER BY Date, ItemNo

Now, this query is taking an immensely long time to complete (I left it running since yesterday afternoon and it was still executing the query this morning). It is for an extremely large dataset, I know, but I've queried out larger datasets substantially faster. I'm guessing that it is due to the either the:

  1. PARTITION BY or
  2. The continual B.Date = (SELECT max(Date) FROM TBLMonthly WHERE Date <= A.Date)

But I'm not sure and, even worse, I don't know how to fix it to make it more efficient and not take so long.

Any thoughts / help would be greatly appreciated!!

Upvotes: 1

Views: 1354

Answers (2)

Wernfried Domscheit
Wernfried Domscheit

Reputation: 59456

Maybe you can create a virtual column on daily table. Should be like this:

CREATE OR REPLACE FUNCTION Is_latest(V_item IN NUMBER, V_MONTH IN DATE, V_time_stamp IN DATE) RETURN DATE IS
    last_ts DATE;
BEGIN
    SELECT MAX(time_stamp)
    INTO last_ts
    FROM TBLDaily
    WHERE ItemNo = V_item
        AND DATE = V_MONTH;
    IF last_ts = V_time_stamp THEN
        RETURN trunc(last_ts, 'mm')
    ELSE
        RETURN NULL;
    END IF;
END;

ALTER TABLE TBLDaily ADD month_of_TS GENERATED ALWAYS AS (Is_latest(ItemNo, Date, time_stamp));

CREATE INDEX IND_XXX on TBLDaily (ItemNo, month_of_TS);

Select *
from TBLDaily d
   JOIN TBLMonthly m ON m.ItemNo = d.ItemNo and m.Date = d.month_of_TS

Upvotes: 1

Wernfried Domscheit
Wernfried Domscheit

Reputation: 59456

Maybe your query is easier and quicker with this approach:

with t AS
(SELECT DISTINCT LAST_VALUE(CharA) OVER (PARTITION BY Date, ItemNo ORDER BY Time_Stamp) as CharA,
   MAX(Time_Stamp) OVER (PARTITION BY Date, ItemNo) as Time_Stamp
FROM TBLDaily)
SELECT *
FROM t
   JOIN TBLMonthly m ON m.ItemNo = d.ItemNo and t.Time_Stamp = m.Time_Stamp

Upvotes: 2

Related Questions