Pinch
Pinch

Reputation: 61

MySQL sporadic MATCH AGAINST behaviour with unique index

When adding a unique key to a multi-table fulltext boolean search, the results cycle through 1 of 3 arbitrary states, with only 1 being correct.

Bear this in mind when checking the sqlfiddle below, as the query MAY initially work correctly - in such case, add whitespace to the left panel then rebuild and rerun - then it should be broken (but it's very hit-and-miss).

http://sqlfiddle.com/#!9/8d95ba/18

Here's the query in question:

SELECT `i`.`item_id`, `g_a`.`alias` AS `group`, `i`.`name` AS `name`
  FROM `item` `i`
  JOIN `group_alias` `g_a` USING (group_id)
    WHERE
      MATCH (`g_a`.`alias`) AGAINST ('Mac*' IN BOOLEAN MODE)
    OR
      MATCH (`i`.`name`) AGAINST ('Mac*' IN BOOLEAN MODE);

Simple enough. But with the following unique index addition:

ALTER TABLE `item_with_unique` ADD UNIQUE INDEX `unique_item_group` (`group_id`, `name`)

the results cycle arbitrarily between these three states:

  1. All rows are returned as if there was no WHERE clause
  2. The alias matches are return as if there was no OR part to the WHERE clause
  3. Correct results are returned (from my experiences, this has been the rarest)

Behaviour seems to be consistent to whichever of these 3 states it's in until the query is changed in some minor way (adding brackets, say) or the schema is rebuilt - at which point there's a chance it might change.

Is these some sort of restriction I've missed in the MySQL docs that describes this behaviour? Is it a bug? Or have I just done something obviously wrong?

Mysql Version 5.6.35 (sqlfiddle at time of writing).

Sqlfiddle for posterity in case the link dies:

CREATE TABLE `group` (
  `group_id` INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
  `name` VARCHAR(256),
  FULLTEXT INDEX `search` (`name`)
) ENGINE = InnoDB;

CREATE TABLE `group_alias` (
  `group_id` INT UNSIGNED NOT NULL,
  `alias` VARCHAR(256),
  CONSTRAINT `alias_group_id`
    FOREIGN KEY (`group_id`)
    REFERENCES `group` (`group_id`),
  FULLTEXT INDEX `search` (`alias`)
) ENGINE = InnoDB;

CREATE TABLE `item` (
  `item_id` INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
  `group_id` INT UNSIGNED,
  `name` VARCHAR(255) NOT NULL,
  CONSTRAINT `item_group_id`
    FOREIGN KEY (`group_id`)
    REFERENCES `group` (`group_id`),
  FULLTEXT INDEX `search` (`name`)
) ENGINE = InnoDB;

CREATE TABLE `item_with_unique` LIKE `item`;
ALTER TABLE `item_with_unique` ADD UNIQUE INDEX `unique_item_group` (`group_id`, `name`);

INSERT INTO `group` (`group_id`, `name`) VALUES (1, 'Thompson');
INSERT INTO `group` (`group_id`, `name`) VALUES (2, 'MacDonald');
INSERT INTO `group` (`group_id`, `name`) VALUES (3, 'Stewart');

INSERT INTO `group_alias` (`group_id`, `alias`) VALUES (1, 'Tomson');
INSERT INTO `group_alias` (`group_id`, `alias`) VALUES (2, 'Something');
INSERT INTO `group_alias` (`group_id`, `alias`) VALUES (3, 'MacStewart');

INSERT INTO `item` (`item_id`, `group_id`, `name`) VALUES (1, 1, 'MacTavish');
INSERT INTO `item` (`item_id`, `group_id`, `name`) VALUES (2, 1, 'MacTavish; Red');
INSERT INTO `item` (`item_id`, `group_id`, `name`) VALUES (3, 2, 'MacAgnew');
INSERT INTO `item` (`item_id`, `group_id`, `name`) VALUES (4, 3, 'Spider');
INSERT INTO `item` (`item_id`, `group_id`, `name`) VALUES (5, 2, 'blahblah');

INSERT INTO `item_with_unique` SELECT * FROM `item`;


SELECT `i`.`item_id`, `g_a`.`alias` AS `group`, `i`.`name` AS `name`,
IF(MATCH (`g_a`.`alias`) AGAINST ('Mac*' IN BOOLEAN MODE), 1, 0) AS `group_match`,
IF(MATCH (`i`.`name`) AGAINST ('Mac*' IN BOOLEAN MODE), 1, 0) AS `item_match`
  FROM `item` `i`
  JOIN `group_alias` `g_a` USING (group_id)
    WHERE
      MATCH (`g_a`.`alias`) AGAINST ('Mac*' IN BOOLEAN MODE)
    OR
      MATCH (`i`.`name`) AGAINST ('Mac*' IN BOOLEAN MODE);

SELECT "Same query, using table with unique index (NOTE: sporadically this is actually correct, in such case, skip to bottom notes)";
SELECT `i`.`item_id`, `g_a`.`alias` AS `group`, `i`.`name` AS `name`,
IF(MATCH (`g_a`.`alias`) AGAINST ('Mac*' IN BOOLEAN MODE), 1, 0) AS `group_match`,
IF(MATCH (`i`.`name`) AGAINST ('Mac*' IN BOOLEAN MODE), 1, 0) AS `item_match`
  FROM `item_with_unique` `i`
  JOIN `group_alias` `g_a` USING (group_id)
    WHERE
      MATCH (`g_a`.`alias`) AGAINST ('Mac*' IN BOOLEAN MODE)
    OR
      MATCH (`i`.`name`) AGAINST ('Mac*' IN BOOLEAN MODE);

SELECT "Union of the two OR match conditions seperately (expected result from second query)";
SELECT `i`.`item_id`, `g_a`.`alias` AS `group`, `i`.`name` AS `name`,
IF(MATCH (`g_a`.`alias`) AGAINST ('Mac*' IN BOOLEAN MODE), 1, 0) AS `group_match`,
IF(MATCH (`i`.`name`) AGAINST ('Mac*' IN BOOLEAN MODE), 1, 0) AS `item_match`
  FROM `item_with_unique` `i`
  JOIN `group_alias` `g_a` USING (group_id)
    WHERE
      MATCH (`g_a`.`alias`) AGAINST ('Mac*' IN BOOLEAN MODE)
UNION
SELECT `i`.`item_id`, `g_a`.`alias` AS `group`, `i`.`name` AS `name`,
IF(MATCH (`g_a`.`alias`) AGAINST ('Mac*' IN BOOLEAN MODE), 1, 0) AS `group_match`,
IF(MATCH (`i`.`name`) AGAINST ('Mac*' IN BOOLEAN MODE), 1, 0) AS `item_match`
  FROM `item_with_unique` `i`
  JOIN `group_alias` `g_a` USING (group_id)
    WHERE
      MATCH (`i`.`name`) AGAINST ('Mac*' IN BOOLEAN MODE);

SELECT "Now rebuild the schema (add a newline somewhere so sqlfiddle thinks it has changed) and observe that the results of the second query.  It may take multiple attempts but it usually cycles between 3 states:";
SELECT "1: Returns ALL results as if there were no conditions (5 rows)";
SELECT "2: Returns results as if there were no second part to the OR condition (1 row)";
SELECT "3: Returns the correct results (rarely)";

Upvotes: 2

Views: 243

Answers (2)

Martin Vysny
Martin Vysny

Reputation: 3201

Try using IGNORE INDEX to your statement:

SELECT `i`.`item_id`, `g_a`.`alias` AS `group`, `i`.`name` AS `name`
  FROM `item` `i`
  IGNORE INDEX (unique_item_group)
  JOIN `group_alias` `g_a` USING (group_id)
    WHERE
      MATCH (`g_a`.`alias`) AGAINST ('Mac*' IN BOOLEAN MODE)
    OR
      MATCH (`i`.`name`) AGAINST ('Mac*' IN BOOLEAN MODE);

MySQL is incredibly dumb to randomly use unique_item_group also for full-text search.

Upvotes: 1

Rick James
Rick James

Reputation: 142518

If you have one-word names and aliases. And you are checking for entire values or leading values. Then FULLTEXT is not the type of index you need.

A simple INDEX(name), together with name LIKE 'Mac%' would be very efficient.

If you have a long phrase with lots of words, and "MacDonald" might be in the middle of it, then FULLTEXT and MATCH ... AGAINST is the right way to go.

With either type of indexing,

WHERE table1 ...
   OR table2 ...

will be inefficient. Roughly speaking, the optimizer will have to do a "cross join" to get all combinations of rows between the two tables, then see which of those match one or other match/like.

Perhaps you have "overnormalized" the data? Can't both the name and alias be in the same table? The query will be much faster, and there will be optimization techniques to make it even faster. What you have will be noticeably slow with only 1K rows; what I propose can be optimized beyond millions, maybe billions of rows.

Upvotes: 0

Related Questions