MrCujo
MrCujo

Reputation: 1333

MySQL fulltext search not returning all matching results

I'm using MySQL 8 and I've got a table called records for which I added the following index in order to perform text search on it:

CREATE FULLTEXT INDEX all_records_idx ON records (`title`, `label`, `description`, `catalog_number`,`barcode`);

Now, I've got the following data (copied it as JSON):

[
  {
    "barcode": "79027066021",
    "title": "White Blood Cells",
    "slug": "White-blood-cells",
    "image": null,
    "description": "rev",
    "artist_id": 3,
    "genre_id": 1,
    "num_of_records": 1,
    "original_release_year": 2001,
    "original_release_date": null,
    "reissue_release_date": null,
    "label": null,
    "sleeve_side_color": "ff1500",
    "vinyl_color": "ff1500",
    "size": 12,
    "barcode_in_record": null,
    "catalog_number": null,
    "owned_copies": 1,
    "active": 1,
    "created_at": "2023-06-11 21:32:17.000",
    "updated_at": null,
    "deleted_at": null
  },
  {
    "barcode": "889854036119",
    "title": "Nilsson Schmilsson",
    "slug": "nilsson-schmilsson",
    "image": null,
    "description": null,
    "artist_id": 6,
    "genre_id": 1,
    "num_of_records": 1,
    "original_release_year": 1971,
    "original_release_date": null,
    "reissue_release_date": null,
    "label": null,
    "sleeve_side_color": "490d57",
    "vinyl_color": "000000",
    "size": 12,
    "barcode_in_record": null,
    "catalog_number": null,
    "owned_copies": 1,
    "active": 1,
    "created_at": "2023-06-11 21:32:17.000",
    "updated_at": null,
    "deleted_at": null
  },
  {
    "barcode": "C10777774644617",
    "title": "Abbey Road",
    "slug": "abbey-road",
    "image": null,
    "description": null,
    "artist_id": 2,
    "genre_id": 1,
    "num_of_records": 1,
    "original_release_year": 1969,
    "original_release_date": null,
    "reissue_release_date": null,
    "label": null,
    "sleeve_side_color": "46518f",
    "vinyl_color": "000000",
    "size": 12,
    "barcode_in_record": null,
    "catalog_number": null,
    "owned_copies": 1,
    "active": 1,
    "created_at": "2023-06-11 21:32:17.000",
    "updated_at": null,
    "deleted_at": null
  },
  {
    "barcode": "C10777774644618",
    "title": "Revolver",
    "slug": "revolver",
    "image": null,
    "description": null,
    "artist_id": 2,
    "genre_id": 1,
    "num_of_records": 1,
    "original_release_year": 1966,
    "original_release_date": null,
    "reissue_release_date": null,
    "label": null,
    "sleeve_side_color": "46518f",
    "vinyl_color": "000000",
    "size": 12,
    "barcode_in_record": null,
    "catalog_number": null,
    "owned_copies": 1,
    "active": 1,
    "created_at": "2023-06-11 21:32:17.000",
    "updated_at": null,
    "deleted_at": null
  },
  {
    "barcode": "C10777774644619",
    "title": "arevolat",
    "slug": "aurevolat",
    "image": null,
    "description": "this is a test",
    "artist_id": 2,
    "genre_id": 1,
    "num_of_records": 1,
    "original_release_year": 1999,
    "original_release_date": null,
    "reissue_release_date": null,
    "label": null,
    "sleeve_side_color": null,
    "vinyl_color": null,
    "size": 12,
    "barcode_in_record": null,
    "catalog_number": null,
    "owned_copies": 1,
    "active": 1,
    "created_at": null,
    "updated_at": null,
    "deleted_at": null
  }
]

So, there are currently 5 records in the table. In order to test the search functionality, I ran the following query:

SELECT * FROM records WHERE MATCH(`title`, `label`, `description`, `catalog_number`,`barcode`)
AGAINST ('*rev*' IN BOOLEAN MODE)

I was expecting to get 3 records back:

But I'm only getting back the first two records:

This one is not present:

Even though I'm using * as part of the search.

DDL:

CREATE TABLE `records` (
  `barcode` varchar(48) NOT NULL,
  `title` varchar(160) NOT NULL,
  `slug` varchar(250) NOT NULL,
  `image` varchar(191) DEFAULT NULL,
  `artist_id` bigint unsigned DEFAULT NULL,
  `genre_id` bigint unsigned DEFAULT NULL,
  `num_of_records` bigint NOT NULL,
  `original_release_date` date DEFAULT NULL,
  `reissue_release_date` date DEFAULT NULL,
  `label` longtext,
  `sleeve_side_color` varchar(191) DEFAULT NULL,
  `vinyl_color` longtext,
  `size` bigint DEFAULT NULL,
  `barcode_in_record` tinyint(1) DEFAULT NULL,
  `catalog_number` longtext,
  `owned_copies` bigint DEFAULT '1',
  `active` tinyint(1) DEFAULT '1',
  `created_at` datetime(3) DEFAULT NULL,
  `updated_at` datetime(3) DEFAULT NULL,
  `deleted_at` datetime(3) DEFAULT NULL,
  `original_release_year` bigint NOT NULL,
  `description` text,
  PRIMARY KEY (`barcode`),
  UNIQUE KEY `barcode` (`barcode`),
  UNIQUE KEY `slug` (`slug`),
  KEY `fk_records_artist` (`artist_id`),
  KEY `fk_records_genre` (`genre_id`),
  FULLTEXT KEY `all_records_idx` (`title`,`label`,`description`,`catalog_number`,`barcode`),
  CONSTRAINT `fk_records_artist` FOREIGN KEY (`artist_id`) REFERENCES `artists` (`id`),
  CONSTRAINT `fk_records_genre` FOREIGN KEY (`genre_id`) REFERENCES `genres` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

Any idea what's going on?

Thanks

Upvotes: 0

Views: 595

Answers (3)

Kostas Nitaf
Kostas Nitaf

Reputation: 1035

12.9.5 Full-Text Restrictions

The '%' character is not a supported wildcard character for full-text searches.

https://dev.mysql.com/doc/refman/5.7/en/fulltext-restrictions.html

Asterisk *

The asterisk serves as the truncation (or wildcard) operator. Unlike the other operators, it is appended to the word to be affected. Words match if they begin with the word preceding the * operator. [...]

The wildcarded word is considered as a prefix that must be present at the start of one or more words.[...]

The following examples demonstrate some search strings that use boolean full-text operators: [...]

'apple*'

Find rows that contain words such as “apple”, “apples”, “applesauce”, or “applet”.

https://dev.mysql.com/doc/refman/5.7/en/fulltext-boolean.html

Upvotes: 1

Zaq
Zaq

Reputation: 1274

In MySQL, the wildcard character is % rather than *.

Try AGAINST ('%rev%' IN BOOLEAN MODE)

Sometimes, * acts as a modifier to match 'zero or more instances of the thing preceding it.'

See https://dev.mysql.com/doc/refman/8.0/en/pattern-matching.html

However, the above Pattern Matching applies to the extended regular expressions (such as when using RLIKE).

For a Boolean Full-text Search, the * is a truncation operator. The * at the end of your expression means that rev must be present at the start of one or more words. This explains the missing record in your results.

I suspect the first * is ignored.

More info: https://dev.mysql.com/doc/refman/8.0/en/fulltext-boolean.html

Upvotes: 1

Rick James
Rick James

Reputation: 142518

"%", "*", and other punctuation marks have specific and different meanings for AGAINST, LIKE, and RLIKE.

MATCH(title, ...) AGAINST('*rev*')  -- ignores the leading punctuation mark
title LIKE '%rev%'  -- finds the desired 3 rows, but only in `title`; and slow
title RLIKE 'rev'  -- finds the desired 3 rows, but only in `title`; and slow
title LIKE '%rev%' OR label LIKE '%rev%' OR ... -- even slower

FULLTEXT, when appropriate, is almost always faster than LIKE or RLIKE; often much faster. I suggest you change the requirements and/or expectations. As you see from above, there is no syntax that is both simple and fast.

Upvotes: 2

Related Questions