Reputation: 33
I got an issue with my SQL code. We developed an application which runs on MySQL, and there it runs fine. So I decided to give MariaDB a try and installed it on a dev machine. On a certain query Stmt, i have a performance issue I do not understand. The query is the following:
SELECT SAMPLES.*, UNIX_TIMESTAMP(SAMPLES.SAMPLE_DATE) as TIMESTAMP,RAWS.VALUE, DATAKEYS.RAW_ID, DATAKEYS.DATA_KEY_VALUE, DATAKEYS.DATA_KEY_ID, KEYDEF.KEY_NAME, KEYDEF.LDD_ID
FROM
PDS.TABLE_SAMPLES SAMPLES
RIGHT OUTER JOIN PDS.TABLE_RAW_VALUES RAWS ON SAMPLES.SAMPLE_ID = RAWS.SAMPLE_ID
RIGHT OUTER JOIN PDS.TABLE_SAMPLE_DATA_KEYS DATAKEYS ON(DATAKEYS.RAW_ID = RAWS.RAW_ID AND DATAKEYS.SAMPLE_ID = SAMPLES.SAMPLE_ID) OR
(DATAKEYS.RAW_ID = 0 AND DATAKEYS.SAMPLE_ID = SAMPLES.SAMPLE_ID)
RIGHT OUTER JOIN PDS.TABLE_DATA_KEY_DEFINITION KEYDEF ON(DATAKEYS.DATA_KEY_ID = KEYDEF.DATA_KEY_ID)
WHERE
SAMPLES.SAMPLE_ID IN(1991331,1991637,1991941,2046105,2046411,2046717,2047023,2047635,2047941,2048247)
AND (SAMPLES.PARAMETER_ID = 9)
GROUP BY DATAKEYS.DATA_KEY_ID, RAWS.RAW_ID, DATAKEYS.DATA_KEY_ID
ORDER BY SAMPLES.SAMPLE_ID, DATAKEYS.RAW_ID;
As long as I got only ONE value in the "WHERE IN" condition, the query takes ~10ms to execute. That's about the same MySQL 5.6 took. As soon as I add another value there, the query time raises to several minutes. In MySQL, it raises very slowly, the Query shown up tehre takes ~150ms on MySQL and about 140 seconds on the new MariaDB installation using exactly the same datasets.
I'm no SQL expert, can you give me some clues how to optimize the query to run as expected?
Upvotes: 2
Views: 1756
Reputation: 11
Instead of operator 'IN' try using 'exists' and the use the subquery instead of using sample_id's.
Upvotes: 0
Reputation: 1270913
The right outer join
s are being converted to inner joins by the where
clause. So, just use the proper join
type (I'm not sure if this affects the optimization of the query, but it could):
SELECT SAMPLES.*, UNIX_TIMESTAMP(SAMPLES.SAMPLE_DATE) as TIMESTAMP,RAWS.VALUE, DATAKEYS.RAW_ID, DATAKEYS.DATA_KEY_VALUE, DATAKEYS.DATA_KEY_ID, KEYDEF.KEY_NAME, KEYDEF.LDD_ID
FROM PDS.TABLE_SAMPLES SAMPLES JOIN
PDS.TABLE_RAW_VALUES RAWS
ON SAMPLES.SAMPLE_ID = RAWS.SAMPLE_ID JOIN
PDS.TABLE_SAMPLE_DATA_KEYS DATAKEYS
ON (DATAKEYS.RAW_ID = RAWS.RAW_ID AND DATAKEYS.SAMPLE_ID = SAMPLES.SAMPLE_ID) OR
(DATAKEYS.RAW_ID = 0 AND DATAKEYS.SAMPLE_ID = SAMPLES.SAMPLE_ID) JOIN
PDS.TABLE_DATA_KEY_DEFINITION KEYDEF
ON DATAKEYS.DATA_KEY_ID = KEYDEF.DATA_KEY_ID)
WHERE SAMPLES.SAMPLE_ID IN (1991331, 1991637, 1991941, 2046105, 2046411, 2046717, 2047023, 2047635, 2047941, 2048247) AND
(SAMPLES.PARAMETER_ID = 9)
GROUP BY DATAKEYS.DATA_KEY_ID, RAWS.RAW_ID, DATAKEYS.DATA_KEY_ID
ORDER BY SAMPLES.SAMPLE_ID, DATAKEYS.RAW_ID;
Next, the best index for this query -- regardless of the number of values in the IN
is the composite index PDS.TABLE_SAMPLES(PARAMETER_ID, SAMPLE_ID)
. This handles the WHERE
clause.
Because your query runs quickly under some circumstances, I assume the other tables have the appropriate indexes for the joins.
Upvotes: 1