Reputation: 57974
If I have a table on my DB called product_tags
with 2 fields: tag_id
and tag_name
Here is the schema:
CREATE TABLE `product_tags` (
`tag_id` int(11) NOT NULL auto_increment,
`tag_name` varchar(255) NOT NULL,
PRIMARY KEY (`tag_id`),
UNIQUE KEY `tag_name` (`tag_name`)
) ENGINE=MyISAM AUTO_INCREMENT=84 DEFAULT CHARSET=utf8
Say here some tags in it:
And I want to do a search on the string "yellow gold diamond band"
I only want to pull the following tags:
Because only those tags are exactly in the string. yellow and diamond are both in the string but not together so the yellow diamond
tag should be ignored.
-Additionally if possible
If I did the search for "yellow gold blue diamond band"
I only want to pull the following tags:
the diamond
tag would be ignored because the blue diamond
tag would be the match.
How can I do this?
Upvotes: 1
Views: 1311
Reputation: 10356
I cant think of any good way to do this in SQL directly.
However if i were to implement it in my application logic, this is what the pseudo logic would probably be like
1. Split the search string "yellow gold diamond band" using " " character. string[] search
2. Take the 1st value from the array i.e. yellow in this case.
3. Do a SELECT * FROM product_tags WHERE tag_name LIKE 'yellow%'
4. This will return "yellow gold" and "yellow diamond"
5. Loop through each of the results in 4
a. Split each of these results using " " string [] result
b. If the split array contains has count = 1, we found an exact match for "yellow". No need to search further
c. If the length of the array > 1, Match the search[1] with result[1] till either you have exhausted the split array and find a match or dont find one
d. If more than one match has been found, the longest match is considered
6. Go back to step 2 and repeat for the next string i.e search[1]
Upvotes: 0
Reputation: 2634
You could probably do something like:
WHERE @searchTerm LIKE CONCAT('%', tag_name, '%')
Not very efficient for lots of tags, but it would work in the simple cases given.
Upvotes: 0
Reputation: 19635
Intuitively you could build an algorithm that iterates over all of the possible word combinations formed by contiguous words within the search phrase, and then find which of those is in your tag table. For instance:
yellow gold blue diamond band
Your possible combinations of contiguous would be:
From this entire list, the only terms that match your original list are:
from this list you could cull any items that repeat the same word, favoring the longer option over the shorter with the assumption that the longer option is more descriptive. Thus, after removing those terms you have:
This looks like the list you want. Now, this approach works but it will become painfully sluggish as the number of terms in a search phrase increases. For instance, just your 5 terms generated 15 potential tag searches. Imagine if you put in 10 words...
Therefore, my honest recommendation is that you use some sort of punctuation to separate tags within a search, thus making it easier to find tags by simply splitting the searh phrase by the punctuation and searching on those terms, like thus:
yellow gold, blue diamond, band
With a comma-delimited list, you now only have 3 search terms rather than 15, making it much easier to search your table of tags.
Upvotes: 1
Reputation: 9124
edit:
select
*
from
product_tags P
where
INSTR('yellow gold diamond band', P.tag_name) > 0
Upvotes: 5