Lightness Races in Orbit
Lightness Races in Orbit

Reputation: 385385

Why does MySQL not always use index merge here?

Consider this table:

CREATE TABLE `Alarms` (
  `AlarmId` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
  `DeviceId` BINARY(16) NOT NULL,
  `Code` BIGINT(20) UNSIGNED NOT NULL,
  `Ended` TINYINT(1) NOT NULL DEFAULT '0',
  `NaturalEnd` TINYINT(1) NOT NULL DEFAULT '0',
  `Pinned` TINYINT(1) NOT NULL DEFAULT '0',
  `Acknowledged` TINYINT(1) NOT NULL DEFAULT '0',
  `StartedAt` TIMESTAMP NOT NULL DEFAULT '0000-00-00 00:00:00',
  `EndedAt` TIMESTAMP NULL DEFAULT NULL,
  `MarkedForDeletion` TINYINT(1) NOT NULL DEFAULT '0',
  PRIMARY KEY (`AlarmId`),
  KEY `Key1` (`Ended`,`Acknowledged`),
  KEY `Key2` (`Pinned`),
  KEY `Key3` (`DeviceId`,`Pinned`),
  KEY `Key4` (`DeviceId`,`StartedAt`,`EndedAt`),
  KEY `Key5` (`DeviceId`,`Ended`,`EndedAt`),
  KEY `Key6` (`MarkedForDeletion`)
) ENGINE=INNODB;

And, for this test, populate it like so:

-- Populate some dummy data; 500 alarms for each
-- of 1000 one-second periods
SET @testDevice = UNHEX('00030000000000000000000000000000');

DROP PROCEDURE IF EXISTS `injectAlarms`;
DELIMITER ;;
CREATE PROCEDURE injectAlarms()
BEGIN
    SET @fromdate  = '2018-02-18 00:00:00';
    SET @numdates  = 1000;
    SET @todate    = DATE_ADD(@fromdate, INTERVAL @numdates SECOND);

    -- Create table of alarm codes to join on
    DROP TABLE IF EXISTS `__codes`;
    CREATE TEMPORARY TABLE `__codes` (
        `Code` BIGINT NOT NULL PRIMARY KEY
    );

    SET @startcode = 0;
    SET @endcode   = 499;

    REPEAT
       INSERT INTO `__codes` VALUES(@startcode);
       SET @startcode = @startcode + 1;
    UNTIL @startcode > @endcode END REPEAT;

    -- Add an alarm for each code, for each second in range
    REPEAT
        INSERT INTO `Alarms`
            (`DeviceId`, `Code`, `Ended`, `NaturalEnd`, `Pinned`, `Acknowledged`, `StartedAt`, `EndedAt`)
            SELECT
                @testDevice,
                `Code`,
                TRUE, FALSE, FALSE, FALSE,
                @fromdate, @fromdate
            FROM `__codes`;

        SET @fromdate = DATE_ADD(@fromdate, INTERVAL 1 SECOND);
    UNTIL @fromdate > @todate END REPEAT;
END;;
DELIMITER ;

CALL injectAlarms();

Now, for some datasets the following query works quite well:

SELECT * FROM `Alarms`
WHERE
   ((`Alarms`.`Ended` = FALSE AND `Alarms`.`Acknowledged` = FALSE) OR `Alarms`.`Pinned` = TRUE) AND
   `MarkedForDeletion` = FALSE AND
   `DeviceId` = @testDevice
;

This is because MariaDB is clever enough to use index merges, e.g.:

id    select_type    table    type         possible_keys                 
1     SIMPLE         Alarms   index_merge  Key1,Key2,Key3,Key4,Key5,Key6 

key             key_len  ref     rows     Extra
Key1,Key2,Key3  2,1,17   (NULL)  2        Using union(Key1,intersect(Key2,Key3)); Using where

However if I use the dataset as populated by the procedure above, and flip the query around a bit (which is another view I need, but in this case will return many more rows):

SELECT * FROM `Alarms`
WHERE
  ((`Alarms`.`Ended` = TRUE OR `Alarms`.`Acknowledged` = TRUE) AND `Alarms`.`Pinned` = FALSE) AND
   `MarkedForDeletion` = FALSE AND
   `DeviceId` = @testDevice
;

… it doesn't:

id    select_type    table    type   possible_keys
1     SIMPLE         Alarms   ref    Key1,Key2,Key3,Key4,Key5,Key6

key   key_len  ref     rows     Extra
Key2  1        const  144706    Using where

I would rather like the index merges to happen more often. As it is, given the ref=const, this query plan doesn't look too scary … however, the query takes almost a second to run. That in itself isn't the end of the world, but the poorly-scaling nature of my design shows when trying a more exotic query, which takes a very long time:

-- Create a temporary table that we'll join against in a mo
DROP TABLE IF EXISTS `_ranges`;
CREATE TEMPORARY TABLE `_ranges` (
    `Start` TIMESTAMP NOT NULL DEFAULT 0,
    `End`   TIMESTAMP NOT NULL DEFAULT 0,
    PRIMARY KEY(`Start`, `End`)
);

-- Populate it (in reality this is performed by my application layer)
SET @endtime = 1518992216;
SET @starttime = @endtime - 86400;
SET @inter = 900;
DROP PROCEDURE IF EXISTS `populateRanges`;
DELIMITER ;;
CREATE PROCEDURE populateRanges()
BEGIN
REPEAT
    INSERT IGNORE INTO `_ranges` VALUES(FROM_UNIXTIME(@starttime),FROM_UNIXTIME(@starttime + @inter));
    SET @starttime = @starttime + @inter;
UNTIL @starttime > @endtime END REPEAT;
END;;
DELIMITER ;
CALL populateRanges();

-- Actual query
SELECT UNIX_TIMESTAMP(`_ranges`.`Start`) AS `Start_TS`,
COUNT(`Alarms`.`AlarmId`) AS `n`
FROM `_ranges`
LEFT JOIN `Alarms`
ON `Alarms`.`StartedAt` < `_ranges`.`End`
  AND (`Alarms`.`EndedAt` IS NULL OR `Alarms`.`EndedAt` >= `_ranges`.`Start`)

  AND ((`Alarms`.`EndedAt` IS NULL AND `Alarms`.`Acknowledged` = FALSE) OR `Alarms`.`Pinned` = TRUE)
-- Again, the above condition is sometimes replaced by:
-- AND ((`Alarms`.`EndedAt` IS NOT NULL OR `Alarms`.`Acknowledged` = TRUE) AND `Alarms`.`Pinned` = FALSE)

 AND `DeviceId` = @testDevice
 AND `MarkedForDeletion` = FALSE
 GROUP BY `_ranges`.`Start`

(This query is supposed to gather a list of counts per time slice, each count indicating how many alarms' [StartedAt,EndedAt] range intersects that time slice. The result populates a line graph.)

Again, when I designed these tables and there weren't many rows in them, index merges seemed to make everything whiz along. But now not so: with the dataset as given in injectAlarms(), this takes 40 seconds to complete!

I noticed this when adding the MarkedForDeletion column and performing some of my first large-dataset scale tests. This is why my choice of indexes doesn't make a big deal out of the presence of MarkedForDeletion, though the results described above are the same if I remove AND MarkedForDeletion = FALSE from my queries; however, I've kept the condition in, as ultimately I will need it to be there.

I've tried a few USE INDEX/FORCE INDEX combinations, but it never seems to use index merge as a result.

What indexes can I define to make this table behave quickly in the given cases? Or how can I restructure my queries to achieve the same goal?

(Above query plans obtained on MariaDB 5.5.56/CentOS 7, but solution must also work on MySQL 5.1.73/CentOS 6.)

Upvotes: 3

Views: 736

Answers (1)

Rick James
Rick James

Reputation: 142528

Wow! That's the most complicated "index merge" I have seen.

Usually (perhaps always), you can make a 'composite' index to replace an index-merge-intersect, and perform better. Change key2 from just (pinned) to (pinned, DeviceId). This may get rid of the 'intersect' and speed it up.

In general, the Optimizer uses index merge only in desperation. (I think this is the answer to the title question.) Any slight changes to the query or the values involved, and the Optimizer will perform the query without index merge.

An improvement on the temp table __codes is to build a permanent table with a large range of values, then use a range of values from that table inside your Proc. If you are using MariaDB, then use the dynamically built "sequence" table. For example the 'table' seq_1_to_100 is effectively a table of one column with numbers 1..100. No need to declare it or populate it.

You can get rid of the other REPEAT loop by computing the time from Code.

Avoiding LOOPs will be the biggest performance benefit.

Get all that done, then I may have other tips.

Upvotes: 1

Related Questions