user109764
user109764

Reputation: 654

MySQL InnoDB FULLTEXT search over JSON generated STORED column is slower than LIKE

Table:

CREATE TABLE `stores` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `slug` varchar(191) COLLATE utf8mb4_unicode_ci NOT NULL,
  `value` json DEFAULT NULL,
  `html` mediumtext COLLATE utf8mb4_unicode_ci
       GENERATED ALWAYS AS (json_unquote(json_extract(`value`,'$.html')))
       STORED,
  PRIMARY KEY (`id`),
  KEY `slug` (`slug`),
  FULLTEXT KEY `html` (`html`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

Query:

select id from `stores`  where  MATCH(stores.html) AGAINST ('forum*' IN BOOLEAN MODE)  limit 20  

Takes 0.14 seconds

Explain:

id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE stores NULL fulltext html html 0 const 1 100.00 Using where; Ft_hints: no_ranking, limit = 20

When query:

select id from `stores`  where   stores.html like '%forum%' limit 20

Takes only 0.003 seconds

Explain:

id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE stores NULL ALL NULL NULL NULL NULL 134101 100.00 Using where

I remember, when I first implemented this virtual generated fields over json it appeared to be faster than like, but now after implementing it over all the fields I noticed the site got slower. So I started analyzing the simple queries and found out that fulltext is actually significantly slower!

When I add SQL_NO_CACHE after select it makes no difference.

What am I missing? Thanks

Upvotes: 0

Views: 471

Answers (1)

Rick James
Rick James

Reputation: 142346

This query is abnormally fast:

select id from `stores`
    where stores.html like '%forum%' limit 20

Because it only looked at enough rows to find 20 that had that string. I think you will find that this takes significantly longer since it will check every row:

select id from `stores`
    where stores.html like '%non-existent-text%' limit 20

Another possible reason is that MATCH found hundreds, maybe thousands, or rows before it got to doing the LIMIT. So time this:

select id from `stores`  
    where  MATCH(stores.html) AGAINST ('qwertyui' IN BOOLEAN MODE)  limit 20  

The bottom line is that you may need to live with such inconsistencies. I believe (without solid proof your your dataset) than MATCH will usually be faster than LIKE. Note the word "usually".

Upvotes: 0

Related Questions