Sanka
Sanka

Reputation: 1324

Oracle Performance Limitation of SQL UNION query & how to optimize

story

I have two codes which perform the same task with the different way.

number_of_factors =10
number_of_formula_portions=5
number_of_travellers=15
cycles=2
  1. call DB multiple times read data =5*15*2 times DB calls. and use ehcache.
  2. call DB only 1 time (one query contains 10*5*15*2 of sql union operations in one query)

logically 2nd one should get more performance because of only 1 DB call, time-saving. But practically 2nd one takes more time to evaluate query.

I have a dynamically generated union query. It has 10*5*15*2 (number_of_factors number_of_formula_portionsnumber_of_travellers*number_of_cycles) union statements. When I run it DB is taking too much time. But when I run it for one traveler via the application, It is fine. I thought logically reading all data at once has a lot of performance, But DB is getting stuck.

UNIT QUERY

select ? as FACTORNAME,
       WEIGHTING,
       ? as KEYCYCLE,
       ? as KEYTRAVELLER,
       ? as KEYSUBFORMULA
from   (
  (SELECT *
   FROM   (
     (SELECT ID,
             ELEMENT_LOGIC_ID,
             FACTOR_VALUE1,
             FACTOR_VALUE2,
             FACTOR_VALUE3,
             FACTOR_VALUE4,
             FACTOR_VALUE5,
             FACTOR_VALUE6,
             FACTOR_VALUE7,
             WEIGHTING,
             START_DATE,
             END_DATE
     FROM   ABC_PRICE_FACTOR
     WHERE  ELEMENT_LOGIC_ID =?
     AND    START_DATE <= ?
     AND    END_DATE >= ? 
     AND    FACTOR_VALUE1=?
     ORDER BY ID DESC )
   )
  WHERE ROWNUM <= 1)
) 

PARAMETERS

    F577(String), 0(String), 0(String), 1(String), 577(Long), 2018-06-28 00:00:00.0(Timestamp), 2018-06-28 00:00:00.0(Timestamp), 1(String), 

SAMPLE UNION QUERY

    select * from (

     select ? as FACTORNAME,WEIGHTING,? as KEYCYCLE,? as KEYTRAVELLER,? as KEYSUBFORMULA from ( (SELECT * FROM ( (SELECT ID, ELEMENT_LOGIC_ID, FACTOR_VALUE1, FACTOR_VALUE2,FACTOR_VALUE3,FACTOR_VALUE4,FACTOR_VALUE5,FACTOR_VALUE6,FACTOR_VALUE7,WEIGHTING,START_DATE, END_DATE FROM ABC_PRICE_FACTOR WHERE ELEMENT_LOGIC_ID =? AND START_DATE <= ? AND END_DATE >= ? AND FACTOR_VALUE1=? ORDER BY ID DESC )) WHERE ROWNUM <= 1) ) 

    union 

    select ? as FACTORNAME,WEIGHTING,? as KEYCYCLE,? as KEYTRAVELLER,? as KEYSUBFORMULA from ( (SELECT * FROM ( (SELECT ID, ELEMENT_LOGIC_ID, FACTOR_VALUE1, FACTOR_VALUE2,FACTOR_VALUE3,FACTOR_VALUE4,FACTOR_VALUE5,FACTOR_VALUE6,FACTOR_VALUE7,WEIGHTING,START_DATE, END_DATE FROM ABC_PRICE_FACTOR WHERE ELEMENT_LOGIC_ID =? AND START_DATE <= ? AND END_DATE >= ? AND FACTOR_VALUE1>? ORDER BY ID DESC )) WHERE ROWNUM <= 1) ) 

    union 

    select ? as FACTORNAME,WEIGHTING,? as KEYCYCLE,? as KEYTRAVELLER,? as KEYSUBFORMULA from ( (SELECT * FROM ( (SELECT ID, ELEMENT_LOGIC_ID, FACTOR_VALUE1, FACTOR_VALUE2,FACTOR_VALUE3,FACTOR_VALUE4,FACTOR_VALUE5,FACTOR_VALUE6,FACTOR_VALUE7,WEIGHTING,START_DATE, END_DATE FROM ABC_PRICE_FACTOR WHERE ELEMENT_LOGIC_ID =? AND START_DATE <= ? AND END_DATE >= ? AND FACTOR_VALUE1<? AND FACTOR_VALUE2=? ORDER BY ID DESC )) WHERE ROWNUM <= 1) ) 

    union 

    ...

    )

note: dynamically bellow part is different in the query. It is depending on factor match type [equal, lower bound, upper bound]. there are 7 factors. FACTOR_VALUE1,FACTOR_VALUE2.... like wise. So I am not going to show you actual SQL here. it has 1.8 MB query.

equal

FACTOR_VALUE1=?

or lower bound

FACTOR_VALUE1<?

or upper bound

FACTOR_VALUE1>?

business logic behind the scene

sorry guys for not providing actual and provide sample query. I am expecting a comment on my approach. It's like we have data of exam result. there are 10 subjects in school. there are 15 students. there are 2 exam term tests.

those are in DB.

this data can be read in 2 ways.

  1. read all data at once, and filter in application level.[large union query]
  2. read one student's one term results at one by one.[small query]

all ideas are welcome.

sample result

Upvotes: 1

Views: 1759

Answers (2)

APC
APC

Reputation: 146239

" I thought logically reading all data at once has a lot of performance, But DB is getting stuck."

Up to a point. One database call will likely be more efficient in terms of network traffic. But the actual call you make executes lots of queries and glues them together with UNION: so there is no performance gain to be had if the main problem is the performance of the individual queries.

One obvious change you can make: use UNION ALL rather than UNION if the subqueries are exclusive, and save yourself some unnecessary sorts.

Beyond that, the logic of the subqueries looks suspect: you're hitting the same subset of rows each time, so you should consider using subquery factoring:

with cte as (
    SELECT ID, ELEMENT_LOGIC_ID, FACTOR_VALUE1, FACTOR_VALUE2,FACTOR_VALUE3,FACTOR_VALUE4,FACTOR_VALUE5,FACTOR_VALUE6,FACTOR_VALUE7,WEIGHTING,START_DATE, END_DATE 
     FROM ABC_PRICE_FACTOR 
     WHERE ELEMENT_LOGIC_ID =? AND START_DATE <= ? AND END_DATE >= ? 
)
select ? as FACTORNAME,WEIGHTING,? as KEYCYCLE,? as KEYTRAVELLER,? as KEYSUBFORMULA from (
  select weighting from ( 
     select weighting
     from cte
     where FACTOR_VALUE1=? 
     order by id desc )
  where rownum <= 1
  union all
  select weighting from ( 
     select weighting
     from cte
     where FACTOR_VALUE1>? 
     order by id desc )
  where rownum <= 1
  union all
  select weighting from ( 
     select weighting
     from cte
     where FACTOR_VALUE1<? AND FACTOR_VALUE2=?
     order by id desc )
  where rownum <= 1
...
)

Warning: tuning without understanding of the data volumes and distribution (skew), data structures or business rules - i.e. what you're asking us to do - is a mug's game. We're just guessing here, and the best you can hope for is that one of those guesses is lucky.

Upvotes: 3

wolφi
wolφi

Reputation: 8361

I think such a query can be optimized with quite a dramatic speed improvement. To achieve that, one must understand the logic behind it, though. On Stackoverflow, this is best done by providing a minimal example and some code.

Idea 1) - START_DATE, END_DATE

You've shown us only ?, so we don't know if the ? for the dates of all subqueries are the same. If so, you could filter down the table once in an initial step, without repeating the filtering 1500 times:

WITH datefiltered AS (SELECT * FROM ABC WHERE start_date <= ? AND end_date >= ?) SELECT ... FROM datefiltered;

Idea 2) - UNION

Your pattern of UNION a lot of subqueries SELECT * FROM (SELECT ... ORDER BY ...) WHERE rownum <=1 is unusual. That is not a bad thing in itself, but it is likely that the database engine is not optimized for unusual queries.

You are using ORDER BY ID DESC)) WHERE ROWNUM <= 1, that means you are searching for the newest(?) row in a category.

The traditional pattern is to find a column (or more, or even an expression) and partition the query by it:

SELECT id, col1, col2 
  FROM (
        SELECT id, col1, col2, 
               ROW_NUMBER(PARTITION BY mycategory ORDER BY ID DESC) as my_rank 
          FROM ABC
       )
 WHERE my_rank <= 1;

In your case, the category is likely much more complex, but you can put that in a big CASE statement that groups your data into your subqueries:

CASE WHEN factor1=xx AND factor2>yy THEN 'mycat1'
     WHEN factor3>zz AND factor2<yy THEN 'mycat2'
     etc
 END;

To put all three together would look like

SELECT id, col1, col2 
  FROM (
        SELECT id, col1, col2, 
               ROW_NUMBER(PARTITION BY mycategory ORDER BY ID DESC) as my_rank 
          FROM (
                SELECT id, col1, col2,
                       CASE WHEN factor...
                        END as mycategory
                  FROM ABC
                 WHERE start_date <= xx AND end_date >= yy
               )
       )
 WHERE my_rank <= 1;

Upvotes: 2

Related Questions