Reputation: 2156
This query returns rows ordered by relevance when searching for a species name. I use it for an autocomplete suggestion list and the relevance calculation works fine, but the query is a bit slow on a large table, and I’m grateful for any tips on how to optimize it (MySQL). My main question is:
EDIT: I use InnoDB as table type, so unfortunately I can not use fulltext indexing in this case (only works with MyIsam tables).
SQL-fiddle here: http://sqlfiddle.com/#!2/f03c4c/5
SELECT QUERY:
SET @search ='Boletus a'; SELECT id, genus, species, fullname, (CASE WHEN (CONCAT(genus, ' ', species)=@search) THEN 1 ELSE 0 END) # EXACT MATCH OF WHOLE NAME + (CASE WHEN (CONCAT(genus, ' ', species) LIKE CONCAT(@search,'%')) THEN 1 ELSE 0 END) # MATCH BEGINNING OF WHOLE NAME + (CASE WHEN (CONCAT(genus, ' ', species) LIKE CONCAT('%',@search,'%')) THEN 1 ELSE 0 END) # LIKE MATCH OF WHOLE NAME + (CASE WHEN (genus=@search) THEN 1 ELSE 0 END) #EXACT MATCH OF genus + (CASE WHEN (species=@search) THEN 1 ELSE 0 END) #EXACT MATCH OF species + (CASE WHEN (genus LIKE CONCAT(@search,'%')) THEN 1 ELSE 0 END) # MATCH BEGINNING OF genus + (CASE WHEN (species LIKE CONCAT(@search,'%')) THEN 1 ELSE 0 END) #MATCH BEGINNING OF species AS relevans FROM species WHERE `fullname` LIKE CONCAT('%',@search,'%') ORDER BY relevans DESC, genus, species LIMIT 50;
Background: A species name consists of at least two parts, genus and epithet (in my table the epithet column is named ”species”). I have three columns in the table: genus, species and fullname. The column ”fullname” can also contain names of lower taxa (varieties and forms as in the sqlfiddle example). I am open for any suggestions on how to make the search more efficient. Maybe a regexp on the search string and target only the column ”fullname” instead of concatenating two columns?
DATABASE SCHEMA EXAMPLE:
CREATE TABLE species (`id` int, `genus` varchar(50), `species` varchar(50), `fullname` varchar(100)) ; INSERT INTO species (`id`, `genus`, `species`, `fullname`) VALUES (360052, 'Afroboletus', 'azureotinctus', 'Afroboletus azureotinctus'), (360053, 'Afroboletus', 'costatisporus', 'Afroboletus costatisporus'), (464267, 'Afroboletus', 'elegans', 'Afroboletus elegans'), (360054, 'Afroboletus', 'lepidellus', 'Afroboletus lepidellus'), (112100, 'Afroboletus', 'luteolus', 'Afroboletus luteolus'), (464266, 'Afroboletus', 'multijugus', 'Afroboletus multijugus'), (112101, 'Afroboletus', 'pterosporus', 'Afroboletus pterosporus'), (326826, 'Aureoboletus', 'auriporus', 'Aureoboletus auriporus'), (326828, 'Aureoboletus', 'gentilis', 'Aureoboletus gentilis'), (309389, 'Aureoboletus', 'novoguineensis', 'Aureoboletus novoguineensis'), (326829, 'Aureoboletus', 'subacidus', 'Aureoboletus subacidus'), (113146, 'Aureoboletus', 'thibetanus', 'Aureoboletus thibetanus'), (118425, 'Austroboletus', 'cookei', 'Austroboletus cookei'), (118427, 'Austroboletus', 'dictyotus', 'Austroboletus dictyotus'), (412550, 'Austroboletus', 'lacunosus', 'Austroboletus lacunosus'), (159051, 'Boletus', 'aereus', 'Boletus aereus'), (171640, 'Boletus', 'appendiculatus', 'Boletus appendiculatus'), (161237, 'Boletus', 'armeniacus', 'Boletus armeniacus'), (563944, 'Boletus', 'australiensis', 'Boletus australiensis'), (444094, 'Boletus', 'badius', 'Boletus badius'), (215376, 'Boletus', 'brunneus', 'Boletus brunneus'), (129701, 'Boletus', 'bubalinus', 'Boletus bubalinus'), (203954, 'Boletus', 'byssinus', 'Boletus byssinus'), (162779, 'Boletus', 'calopus', 'Boletus calopus'), (129469, 'Boletus', 'caucasicus', 'Boletus caucasicus'), (208740, 'Boletus', 'chrysenteron', 'Boletus chrysenteron'), (486540, 'Boletus', 'cisalpinus', 'Boletus cisalpinus'), (368037, 'Boletus', 'declivitatum', 'Boletus declivitatum'), (104061, 'Boletus', 'depilatus', 'Boletus depilatus'), (356530, 'Boletus', 'edulis', 'Boletus edulis'), (356278, 'Boletus', 'erythropus', 'Boletus erythropus var. immutatus'), (417068, 'Boletus', 'erythropus', 'Boletus erythropus var. erythropus'), (563943, 'Boletus', 'eximius', 'Boletus eximius'), (264716, 'Boletus', 'fechtneri', 'Boletus fechtneri'), (372473, 'Boletus', 'ferrugineus', 'Boletus ferrugineus'), (141943, 'Boletus', 'flavus', 'Boletus flavus'), (247434, 'Boletus', 'fragrans', 'Boletus fragrans'), (302971, 'Boletus', 'fuligineus', 'Boletus fuligineus'), (218213, 'Boletus', 'impolitus', 'Boletus impolitus'), (327048, 'Boletus', 'legaliae', 'Boletus legaliae'), (327051, 'Boletus', 'leptospermi', 'Boletus leptospermi'), (235486, 'Boletus', 'lignatilis', 'Boletus lignatilis'), (354822, 'Boletus', 'luridiformis', 'Boletus luridiformis var. junquilleus'), (354845, 'Boletus', 'luridiformis', 'Boletus luridiformis var. discolor'), (430254, 'Boletus', 'luridiformis', 'Boletus luridiformis var. luridiformis'), (132915, 'Boletus', 'luridus', 'Boletus luridus var. rubriceps'), (417113, 'Boletus', 'luridus', 'Boletus luridus var. luridus'), (241417, 'Boletus', 'megalosporus', 'Boletus megalosporus'), (282394, 'Boletus', 'moravicus', 'Boletus moravicus'), (196024, 'Boletus', 'paluster', 'Boletus paluster') ;
Upvotes: 1
Views: 1828
Reputation: 51868
My advice, forget this query, create a fulltext index.
Create the index covering the columns genus, species and fullname (all in one index). Then query like this:
SELECT * FROM your_table WHERE MATCH(genus, species, fullname) AGAINST ('Boletus a');
You can also use the MATCH(genus, species, fullname) AGAINST ('Boletus a')
in other parts of the query:
SELECT MATCH(genus, species, fullname) AGAINST ('Boletus a') #displays relevancy (a value between 0 and 1)
FROM your_table
WHERE
MATCH(genus, species, fullname) AGAINST ('Boletus a') #filters (obviously)
ORDER BY MATCH(genus, species, fullname) AGAINST ('Boletus a') #also obvious, orders by relevancy
;
Upvotes: 2