Reputation: 385385
Consider this table:
CREATE TABLE `Alarms` (
`AlarmId` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
`DeviceId` BINARY(16) NOT NULL,
`Code` BIGINT(20) UNSIGNED NOT NULL,
`Ended` TINYINT(1) NOT NULL DEFAULT '0',
`NaturalEnd` TINYINT(1) NOT NULL DEFAULT '0',
`Pinned` TINYINT(1) NOT NULL DEFAULT '0',
`Acknowledged` TINYINT(1) NOT NULL DEFAULT '0',
`StartedAt` TIMESTAMP NOT NULL DEFAULT '0000-00-00 00:00:00',
`EndedAt` TIMESTAMP NULL DEFAULT NULL,
`MarkedForDeletion` TINYINT(1) NOT NULL DEFAULT '0',
PRIMARY KEY (`AlarmId`),
KEY `Key1` (`Ended`,`Acknowledged`),
KEY `Key2` (`Pinned`),
KEY `Key3` (`DeviceId`,`Pinned`),
KEY `Key4` (`DeviceId`,`StartedAt`,`EndedAt`),
KEY `Key5` (`DeviceId`,`Ended`,`EndedAt`),
KEY `Key6` (`MarkedForDeletion`)
) ENGINE=INNODB;
And, for this test, populate it like so:
-- Populate some dummy data; 500 alarms for each
-- of 1000 one-second periods
SET @testDevice = UNHEX('00030000000000000000000000000000');
DROP PROCEDURE IF EXISTS `injectAlarms`;
DELIMITER ;;
CREATE PROCEDURE injectAlarms()
BEGIN
SET @fromdate = '2018-02-18 00:00:00';
SET @numdates = 1000;
SET @todate = DATE_ADD(@fromdate, INTERVAL @numdates SECOND);
-- Create table of alarm codes to join on
DROP TABLE IF EXISTS `__codes`;
CREATE TEMPORARY TABLE `__codes` (
`Code` BIGINT NOT NULL PRIMARY KEY
);
SET @startcode = 0;
SET @endcode = 499;
REPEAT
INSERT INTO `__codes` VALUES(@startcode);
SET @startcode = @startcode + 1;
UNTIL @startcode > @endcode END REPEAT;
-- Add an alarm for each code, for each second in range
REPEAT
INSERT INTO `Alarms`
(`DeviceId`, `Code`, `Ended`, `NaturalEnd`, `Pinned`, `Acknowledged`, `StartedAt`, `EndedAt`)
SELECT
@testDevice,
`Code`,
TRUE, FALSE, FALSE, FALSE,
@fromdate, @fromdate
FROM `__codes`;
SET @fromdate = DATE_ADD(@fromdate, INTERVAL 1 SECOND);
UNTIL @fromdate > @todate END REPEAT;
END;;
DELIMITER ;
CALL injectAlarms();
Now, for some datasets the following query works quite well:
SELECT * FROM `Alarms`
WHERE
((`Alarms`.`Ended` = FALSE AND `Alarms`.`Acknowledged` = FALSE) OR `Alarms`.`Pinned` = TRUE) AND
`MarkedForDeletion` = FALSE AND
`DeviceId` = @testDevice
;
This is because MariaDB is clever enough to use index merges, e.g.:
id select_type table type possible_keys
1 SIMPLE Alarms index_merge Key1,Key2,Key3,Key4,Key5,Key6
key key_len ref rows Extra
Key1,Key2,Key3 2,1,17 (NULL) 2 Using union(Key1,intersect(Key2,Key3)); Using where
However if I use the dataset as populated by the procedure above, and flip the query around a bit (which is another view I need, but in this case will return many more rows):
SELECT * FROM `Alarms`
WHERE
((`Alarms`.`Ended` = TRUE OR `Alarms`.`Acknowledged` = TRUE) AND `Alarms`.`Pinned` = FALSE) AND
`MarkedForDeletion` = FALSE AND
`DeviceId` = @testDevice
;
… it doesn't:
id select_type table type possible_keys
1 SIMPLE Alarms ref Key1,Key2,Key3,Key4,Key5,Key6
key key_len ref rows Extra
Key2 1 const 144706 Using where
I would rather like the index merges to happen more often. As it is, given the ref=const
, this query plan doesn't look too scary … however, the query takes almost a second to run. That in itself isn't the end of the world, but the poorly-scaling nature of my design shows when trying a more exotic query, which takes a very long time:
-- Create a temporary table that we'll join against in a mo
DROP TABLE IF EXISTS `_ranges`;
CREATE TEMPORARY TABLE `_ranges` (
`Start` TIMESTAMP NOT NULL DEFAULT 0,
`End` TIMESTAMP NOT NULL DEFAULT 0,
PRIMARY KEY(`Start`, `End`)
);
-- Populate it (in reality this is performed by my application layer)
SET @endtime = 1518992216;
SET @starttime = @endtime - 86400;
SET @inter = 900;
DROP PROCEDURE IF EXISTS `populateRanges`;
DELIMITER ;;
CREATE PROCEDURE populateRanges()
BEGIN
REPEAT
INSERT IGNORE INTO `_ranges` VALUES(FROM_UNIXTIME(@starttime),FROM_UNIXTIME(@starttime + @inter));
SET @starttime = @starttime + @inter;
UNTIL @starttime > @endtime END REPEAT;
END;;
DELIMITER ;
CALL populateRanges();
-- Actual query
SELECT UNIX_TIMESTAMP(`_ranges`.`Start`) AS `Start_TS`,
COUNT(`Alarms`.`AlarmId`) AS `n`
FROM `_ranges`
LEFT JOIN `Alarms`
ON `Alarms`.`StartedAt` < `_ranges`.`End`
AND (`Alarms`.`EndedAt` IS NULL OR `Alarms`.`EndedAt` >= `_ranges`.`Start`)
AND ((`Alarms`.`EndedAt` IS NULL AND `Alarms`.`Acknowledged` = FALSE) OR `Alarms`.`Pinned` = TRUE)
-- Again, the above condition is sometimes replaced by:
-- AND ((`Alarms`.`EndedAt` IS NOT NULL OR `Alarms`.`Acknowledged` = TRUE) AND `Alarms`.`Pinned` = FALSE)
AND `DeviceId` = @testDevice
AND `MarkedForDeletion` = FALSE
GROUP BY `_ranges`.`Start`
(This query is supposed to gather a list of counts per time slice, each count indicating how many alarms' [StartedAt,EndedAt] range intersects that time slice. The result populates a line graph.)
Again, when I designed these tables and there weren't many rows in them, index merges seemed to make everything whiz along. But now not so: with the dataset as given in injectAlarms()
, this takes 40 seconds to complete!
I noticed this when adding the MarkedForDeletion
column and performing some of my first large-dataset scale tests. This is why my choice of indexes doesn't make a big deal out of the presence of MarkedForDeletion
, though the results described above are the same if I remove AND MarkedForDeletion = FALSE
from my queries; however, I've kept the condition in, as ultimately I will need it to be there.
I've tried a few USE INDEX
/FORCE INDEX
combinations, but it never seems to use index merge as a result.
What indexes can I define to make this table behave quickly in the given cases? Or how can I restructure my queries to achieve the same goal?
(Above query plans obtained on MariaDB 5.5.56/CentOS 7, but solution must also work on MySQL 5.1.73/CentOS 6.)
Upvotes: 3
Views: 736
Reputation: 142528
Wow! That's the most complicated "index merge" I have seen.
Usually (perhaps always), you can make a 'composite' index to replace an index-merge-intersect, and perform better. Change key2
from just (pinned)
to (pinned, DeviceId)
. This may get rid of the 'intersect' and speed it up.
In general, the Optimizer uses index merge only in desperation. (I think this is the answer to the title question.) Any slight changes to the query or the values involved, and the Optimizer will perform the query without index merge.
An improvement on the temp table __codes
is to build a permanent table with a large range of values, then use a range of values from that table inside your Proc. If you are using MariaDB, then use the dynamically built "sequence" table. For example the 'table' seq_1_to_100
is effectively a table of one column with numbers 1..100. No need to declare it or populate it.
You can get rid of the other REPEAT
loop by computing the time from Code
.
Avoiding LOOPs
will be the biggest performance benefit.
Get all that done, then I may have other tips.
Upvotes: 1