Reputation: 473
I've read quite a few similar posts but none solves my case, which could well be because of my lack of sufficient knowledge, so please bear with me.
One of the search options in my terminological dictionary is "whole words only". At first I was using
WHERE ".$source." RLIKE '[[:<:]]".$keyword."[[:>:]]'
However, this failed to match whole words for the first or second $keyword
when there is more than one. Then I found
WHERE ".$source." REGEXP '[[:<:]]".$keyword."[[:>:]]'
and
WHERE ".$source." REGEXP '(^| )".$keyword."( |$)'
while searching these forums
I just tested both of the above in my PhpMyAdmin and found out that the former executes in 0.0740 seconds, while the latter takes twice as long, 0.1440 seconds, so I guess I should stick with the former.
What bothers me the most is the huge discrepancy in results, e.g. searching for a single word ("tool"):
Using the [[:<:]]
and [[:>:]]
word boundary in PhpMyAdmin returns 34 results.
Using (^| )
and ( |$)
in PhpMyAdmin returns 26 results.
Running the #1 regexp in my PHP script returns 34 results (this is the correct number).
Here's the whole MySQL block:
foreach($keywords as $keyword) {
$query = $db->query("SELECT * FROM ".DICTIONARY_TABLE." " .
"JOIN ".DICTIONARY_THEMES." ON ".DICTIONARY_TABLE.".theme_id = ".DICTIONARY_THEMES.".theme_id ".
"LEFT JOIN ".DICTIONARY_DEFINITIONS." ON ".DICTIONARY_TABLE.".term_id = ".DICTIONARY_DEFINITIONS.".term_id ".
"WHERE ".DICTIONARY_TABLE.".".$source." REGEXP '(^| )".$keyword."( |$)'".
//"WHERE ".DICTIONARY_TABLE.".".$source." REGEXP '[[:<:]]".$keyword."[[:>:]]'".
" ORDER BY ".DICTIONARY_TABLE.".theme_id, ".DICTIONARY_TABLE.".".$source."");
}
I've commented out the search option I'm not using.
Now, if I try TWO keywords, e.g. "cutting tool", I still get 34 results in the page. I'm unsure if I'm doing this right in PhpMyAdmin:
SELECT * FROM `asphodel_dictionary_terms` WHERE english REGEXP '[[:<:]]cutting[[:>:]]';
SELECT * FROM `asphodel_dictionary_terms` WHERE english REGEXP '[[:<:]]tool[[:>:]]'
This returns 44 results for "cutting" and 34 results for "tool". The query using (^| )
... returns 37 + 26 results respectively.
Any feedback that would help me sort things out would be appreciated!
The database contains a total of 109,000 entries in the main table, there are 82 themes in the DICTIONARY_THEMES
table and 727 entries in the DICTIONARY_DEFINITIONS
table. Not a huge database and it won't grow much bigger.
Upvotes: 1
Views: 874
Reputation: 222432
You are getting different results because the two regexes are not identical.
(^| )
means : either the beginning of the string or a space (( |$)
has the same meaning at end of string).
[[:<:]]
and [[:>:]]
are word boundaries : conceptually this refers to characters that separate words, and usually regex engines interpret it as something like : anything but a digit, a letter or an underscore.
So basically the first pattern is more restrictive than the second (space, beginning and end of string are word boundaries, but there are others).
If you have more than one keyword to search for, you would need to repeat the regex matches, like :
WHERE
".$source." RLIKE '[[:<:]]".$keyword1."[[:>:]]'
OR ".$source." RLIKE '[[:<:]]".$keyword2."[[:>:]]'
Or create a new regex by combining the keywords :
WHERE
".$source." RLIKE '[[:<:]](".$keyword1.")|(".$keyword2.")[[:>:]]'
NB : for search requirement, you should consider using MySQL Full Text Search, which are primarily built for the purpose of searching for full words (there are pre-requisites, though).
Upvotes: 1