Reputation: 1324
story
I have two codes which perform the same task with the different way.
number_of_factors =10
number_of_formula_portions=5
number_of_travellers=15
cycles=2
logically 2nd one should get more performance because of only 1 DB call, time-saving. But practically 2nd one takes more time to evaluate query.
I have a dynamically generated union query. It has 10*5*15*2 (number_of_factors number_of_formula_portionsnumber_of_travellers*number_of_cycles) union statements. When I run it DB is taking too much time. But when I run it for one traveler via the application, It is fine. I thought logically reading all data at once has a lot of performance, But DB is getting stuck.
UNIT QUERY
select ? as FACTORNAME,
WEIGHTING,
? as KEYCYCLE,
? as KEYTRAVELLER,
? as KEYSUBFORMULA
from (
(SELECT *
FROM (
(SELECT ID,
ELEMENT_LOGIC_ID,
FACTOR_VALUE1,
FACTOR_VALUE2,
FACTOR_VALUE3,
FACTOR_VALUE4,
FACTOR_VALUE5,
FACTOR_VALUE6,
FACTOR_VALUE7,
WEIGHTING,
START_DATE,
END_DATE
FROM ABC_PRICE_FACTOR
WHERE ELEMENT_LOGIC_ID =?
AND START_DATE <= ?
AND END_DATE >= ?
AND FACTOR_VALUE1=?
ORDER BY ID DESC )
)
WHERE ROWNUM <= 1)
)
PARAMETERS
F577(String), 0(String), 0(String), 1(String), 577(Long), 2018-06-28 00:00:00.0(Timestamp), 2018-06-28 00:00:00.0(Timestamp), 1(String),
SAMPLE UNION QUERY
select * from (
select ? as FACTORNAME,WEIGHTING,? as KEYCYCLE,? as KEYTRAVELLER,? as KEYSUBFORMULA from ( (SELECT * FROM ( (SELECT ID, ELEMENT_LOGIC_ID, FACTOR_VALUE1, FACTOR_VALUE2,FACTOR_VALUE3,FACTOR_VALUE4,FACTOR_VALUE5,FACTOR_VALUE6,FACTOR_VALUE7,WEIGHTING,START_DATE, END_DATE FROM ABC_PRICE_FACTOR WHERE ELEMENT_LOGIC_ID =? AND START_DATE <= ? AND END_DATE >= ? AND FACTOR_VALUE1=? ORDER BY ID DESC )) WHERE ROWNUM <= 1) )
union
select ? as FACTORNAME,WEIGHTING,? as KEYCYCLE,? as KEYTRAVELLER,? as KEYSUBFORMULA from ( (SELECT * FROM ( (SELECT ID, ELEMENT_LOGIC_ID, FACTOR_VALUE1, FACTOR_VALUE2,FACTOR_VALUE3,FACTOR_VALUE4,FACTOR_VALUE5,FACTOR_VALUE6,FACTOR_VALUE7,WEIGHTING,START_DATE, END_DATE FROM ABC_PRICE_FACTOR WHERE ELEMENT_LOGIC_ID =? AND START_DATE <= ? AND END_DATE >= ? AND FACTOR_VALUE1>? ORDER BY ID DESC )) WHERE ROWNUM <= 1) )
union
select ? as FACTORNAME,WEIGHTING,? as KEYCYCLE,? as KEYTRAVELLER,? as KEYSUBFORMULA from ( (SELECT * FROM ( (SELECT ID, ELEMENT_LOGIC_ID, FACTOR_VALUE1, FACTOR_VALUE2,FACTOR_VALUE3,FACTOR_VALUE4,FACTOR_VALUE5,FACTOR_VALUE6,FACTOR_VALUE7,WEIGHTING,START_DATE, END_DATE FROM ABC_PRICE_FACTOR WHERE ELEMENT_LOGIC_ID =? AND START_DATE <= ? AND END_DATE >= ? AND FACTOR_VALUE1<? AND FACTOR_VALUE2=? ORDER BY ID DESC )) WHERE ROWNUM <= 1) )
union
...
)
note: dynamically bellow part is different in the query. It is depending on factor match type [equal, lower bound, upper bound]. there are 7 factors. FACTOR_VALUE1,FACTOR_VALUE2.... like wise. So I am not going to show you actual SQL here. it has 1.8 MB query.
equal
FACTOR_VALUE1=?
or lower bound
FACTOR_VALUE1<?
or upper bound
FACTOR_VALUE1>?
business logic behind the scene
sorry guys for not providing actual and provide sample query. I am expecting a comment on my approach. It's like we have data of exam result. there are 10 subjects in school. there are 15 students. there are 2 exam term tests.
those are in DB.
this data can be read in 2 ways.
all ideas are welcome.
Upvotes: 1
Views: 1759
Reputation: 146239
" I thought logically reading all data at once has a lot of performance, But DB is getting stuck."
Up to a point. One database call will likely be more efficient in terms of network traffic. But the actual call you make executes lots of queries and glues them together with UNION: so there is no performance gain to be had if the main problem is the performance of the individual queries.
One obvious change you can make: use UNION ALL rather than UNION if the subqueries are exclusive, and save yourself some unnecessary sorts.
Beyond that, the logic of the subqueries looks suspect: you're hitting the same subset of rows each time, so you should consider using subquery factoring:
with cte as (
SELECT ID, ELEMENT_LOGIC_ID, FACTOR_VALUE1, FACTOR_VALUE2,FACTOR_VALUE3,FACTOR_VALUE4,FACTOR_VALUE5,FACTOR_VALUE6,FACTOR_VALUE7,WEIGHTING,START_DATE, END_DATE
FROM ABC_PRICE_FACTOR
WHERE ELEMENT_LOGIC_ID =? AND START_DATE <= ? AND END_DATE >= ?
)
select ? as FACTORNAME,WEIGHTING,? as KEYCYCLE,? as KEYTRAVELLER,? as KEYSUBFORMULA from (
select weighting from (
select weighting
from cte
where FACTOR_VALUE1=?
order by id desc )
where rownum <= 1
union all
select weighting from (
select weighting
from cte
where FACTOR_VALUE1>?
order by id desc )
where rownum <= 1
union all
select weighting from (
select weighting
from cte
where FACTOR_VALUE1<? AND FACTOR_VALUE2=?
order by id desc )
where rownum <= 1
...
)
Warning: tuning without understanding of the data volumes and distribution (skew), data structures or business rules - i.e. what you're asking us to do - is a mug's game. We're just guessing here, and the best you can hope for is that one of those guesses is lucky.
Upvotes: 3
Reputation: 8361
I think such a query can be optimized with quite a dramatic speed improvement. To achieve that, one must understand the logic behind it, though. On Stackoverflow, this is best done by providing a minimal example and some code.
Idea 1) - START_DATE
, END_DATE
You've shown us only ?
, so we don't know if the ?
for the dates of all subqueries are the same. If so, you could filter down the table once in an initial step, without repeating the filtering 1500 times:
WITH datefiltered AS (SELECT * FROM ABC WHERE start_date <= ? AND end_date >= ?)
SELECT ... FROM datefiltered;
Idea 2) - UNION
Your pattern of UNION
a lot of subqueries SELECT * FROM (SELECT ... ORDER BY ...) WHERE rownum <=1
is unusual. That is not a bad thing in itself, but it is likely that the database engine is not optimized for unusual queries.
You are using ORDER BY ID DESC)) WHERE ROWNUM <= 1
, that means you are searching for the newest(?) row in a category.
The traditional pattern is to find a column (or more, or even an expression) and partition the query by it:
SELECT id, col1, col2
FROM (
SELECT id, col1, col2,
ROW_NUMBER(PARTITION BY mycategory ORDER BY ID DESC) as my_rank
FROM ABC
)
WHERE my_rank <= 1;
In your case, the category is likely much more complex, but you can put that in a big CASE
statement that groups your data into your subqueries:
CASE WHEN factor1=xx AND factor2>yy THEN 'mycat1'
WHEN factor3>zz AND factor2<yy THEN 'mycat2'
etc
END;
To put all three together would look like
SELECT id, col1, col2
FROM (
SELECT id, col1, col2,
ROW_NUMBER(PARTITION BY mycategory ORDER BY ID DESC) as my_rank
FROM (
SELECT id, col1, col2,
CASE WHEN factor...
END as mycategory
FROM ABC
WHERE start_date <= xx AND end_date >= yy
)
)
WHERE my_rank <= 1;
Upvotes: 2