Reputation: 13
I have an old mysql query that I have performance problems with, as the number of items have grown suddenly. The main problem is probably blatantly obvious for a seasoned sql-expert, but not so for me unfortunately. The original query runs for at least 30 seconds:
SET SQL_BIG_SELECTS = 1;
SET @start_date = '2018-12-31';
SET @end_date = '2019-08-14';
SET @client_id = 16;
SELECT
L.instrument_id as iid,
I.instrument as instrument,
X.portfolio as port,
T.ExcludeValue as exclude_value,
sum(if(L.trade_type=1 and L.trade_date<=@start_date,L.nominal*PS.price*I.scale_factor,0)) as val_start,
sum(if(L.trade_type=1 and L.trade_date<=@end_date,L.nominal*PE.price*I.scale_factor,0)) as val_end,
sum(if(L.nominal>0 and L.trade_type=1 and L.trade_date>@start_date and L.trade_date<=@end_date,-L.nominal*L.price*I.scale_factor,0)) as val_buy,
sum(if(L.nominal<0 and L.trade_type=1 and L.trade_date>@start_date and L.trade_date<=@end_date,-L.nominal*L.price*I.scale_factor,0)) as val_sell,
sum(if(L.trade_type>1 and L.trade_date>@start_date and L.trade_date<=@end_date,L.nominal*L.price*I.scale_factor,0)) as val_other
FROM ledger L, instruments I, portfolios X, prices PS, prices PE, instrument_types T
WHERE
L.instrument_id NOT IN (95) and
I.instrument_type_id=T.id and
I.id=L.instrument_id and
X.id=L.portfolio_id and
PS.instrument_id=L.instrument_id and
PE.instrument_id=L.instrument_id and
L.is_current=1 and
L.trade_status_id=2 and
L.client_id=@client_id and
L.trade_date<=@end_date and
PS.trade_date=(select min(trade_date) from prices where instrument_id=L.instrument_id and is_current=1 and trade_date>=@start_date and trade_date<=@end_date) and
PE.trade_date=(select max(trade_date) from prices where instrument_id=L.instrument_id and is_current=1 and trade_date>=@start_date and trade_date<=@end_date)
GROUP BY L.instrument_id, L.portfolio_id
This query is extremely slow. I'm guessing that the problem lies in the sub-selects of PS and PE. They are needed as the min/max trade_date are individual for each instrument.
If I split it up into 2 queries, the first runs in 3 ms:
SELECT
L.instrument_id as iid,
I.instrument as instrument,
X.portfolio as port,
T.ExcludeValue as exclude_value,
sum(if(L.trade_type=1 and L.trade_date<=@start_date,L.nominal*I.scale_factor,0)) as val_start,
sum(if(L.trade_type=1 and L.trade_date<=@end_date,L.nominal*I.scale_factor,0)) as val_end,
sum(if(L.nominal>0 and L.trade_type=1 and L.trade_date>@start_date and L.trade_date<=@end_date,-L.nominal*L.price*I.scale_factor,0)) as val_buy,
sum(if(L.nominal<0 and L.trade_type=1 and L.trade_date>@start_date and L.trade_date<=@end_date,-L.nominal*L.price*I.scale_factor,0)) as val_sell,
sum(if(L.trade_type>1 and L.trade_date>@start_date and L.trade_date<=@end_date,L.nominal*L.price*I.scale_factor,0)) as val_other
FROM ledger L, instruments I, portfolios X, prices PS, prices PE, instrument_types T
WHERE
L.instrument_id NOT IN (95) and
I.instrument_type_id=T.id and
I.id=L.instrument_id and
X.id=L.portfolio_id and
L.is_current=1 and
L.trade_status_id=2 and
L.client_id=@client_id and
L.trade_date<=@end_date
GROUP BY L.instrument_id, L.portfolio_id
and the second, an alternative price query, runs in 0.3 seconds:
SELECT d.instrument_id, d.dt_min as dS, ps.price as pS, d.dt_max as dE, pe.price as pE
from (select p1.instrument_id, min(p1.trade_date) as dt_min, max(p1.trade_date) as dt_max from prices p1 where p1.is_current=1 and p1.trade_date>=@start_date and p1.trade_date<=@end_date group by p1.instrument_id) d
left join prices ps on ps.trade_date=d.dt_min
left join prices pe on pe.trade_date=d.dt_max
where ps.instrument_id=d.instrument_id and pe.instrument_id=d.instrument_id
Not sure how I can put the two together in the most efficient way, or even if there is some fundamental issue with them to begin with...any help appreciated.
Upvotes: 0
Views: 77
Reputation: 13
Following Uueerdo leads, the query has become:
SET SQL_BIG_SELECTS = 1;
SET @start_date = '2018-12-31';
SET @end_date = '2019-08-14';
SET @client_id = 16;
SELECT
L.instrument_id as iid,
I.instrument as instrument,
X.portfolio as port,
T.ExcludeValue as exclude_value,
sum(if(L.trade_type=1 and L.trade_date<=@start_date,L.nominal*PS.price*I.scale_factor,0)) as val_start,
sum(if(L.trade_type=1 and L.trade_date<=@end_date,L.nominal*PE.price*I.scale_factor,0)) as val_end,
sum(if(L.nominal>0 and L.trade_type=1 and L.trade_date>@start_date and L.trade_date<=@end_date,-L.nominal*L.price*I.scale_factor,0)) as val_buy,
sum(if(L.nominal<0 and L.trade_type=1 and L.trade_date>@start_date and L.trade_date<=@end_date,-L.nominal*L.price*I.scale_factor,0)) as val_sell,
sum(if(L.trade_type>1 and L.trade_date>@start_date and L.trade_date<=@end_date,L.nominal*L.price*I.scale_factor,0)) as val_other
FROM ledger L
LEFT JOIN instruments I ON L.instrument_id=I.id
LEFT JOIN portfolios X ON L.portfolio_id=X.id
LEFT JOIN instrument_types T on I.instrument_type_id=T.id
INNER JOIN (
SELECT instrument_id, MIN(trade_date) AS start_date, MAX(trade_date) AS end_date
FROM prices
WHERE is_current=1
AND trade_date>=@start_date AND trade_date<=@end_date
GROUP BY instrument_id
) AS pb ON L.instrument_id = pb.instrument_id
INNER JOIN prices AS PS ON pb.instrument_id = PS.instrument_id AND pb.start_date = PS.trade_date
INNER JOIN prices AS PE ON pb.instrument_id = PE.instrument_id AND pb.end_date = PE.trade_date
It's many times faster now...
Thank you!!!
Upvotes: 0
Reputation: 15941
You can convert the use of a correlated subquery such as:
PS.trade_date=(select min(trade_date) from prices where instrument_id=L.instrument_id and is_current=1 and trade_date>=@start_date and trade_date<=@end_date)
into a non-correlated one like this:
(L.instrument_id, PS.trade_date)
IN (SELECT instrument_id, MIN(trade_date)
FROM prices
WHERE is_current=1
AND trade_date BETWEEN @start_date AND @end_date
GROUP BY instrument_id
)
In the correlated version you are using, the subquery is executed independently for every outer result candidate; in the non-correlated one I present, the subquery is executed once and the outer query's candidate results are checked against that set.
In cases an the outer query returns few rows, and or the subquery data would be relatively huge if not correlated, correlated subqueries can be helpful; but in my experience, in most situations non-correlated versions usually perform much better.
Using explicit join syntax, makes the approach of using only one subquery more obvious:
...
INNER JOIN (
SELECT instrument_id, MIN(trade_date) AS start_date, MAX(trade_date) AS end_date
FROM prices
WHERE is_current=1
AND trade_date BETWEEN @start_date AND @end_date
GROUP BY instrument_id
) AS pb ON L.instrument_id = pb .instrument_id
INNER JOIN prices AS PS ON pb.instrument_id = PS.instrument_id AND pb.start_date = PS.trade_date
INNER JOIN prices AS PE ON pb.instrument_id = PE.instrument_id AND pb.end_date = PE.trade_date
Upvotes: 1