Reputation: 26105
I am creating a simple search function for my website using MySQL and PHP. Right now, if type the word "cat" into the search bar, I will NOT be able to retrieve articles with the word "cats", and vice-versa. It is the same with the ending "ed".
The only way that I can think of to solve this problem is by removing all "s" and "ed" from the end of each word that is longer than a certain length (to avoid turning "Ted" into "T", etc). However, this simple solution is nowhere near perfect. I'm hoping someone can provide me with a better solution.
Upvotes: 4
Views: 569
Reputation: 389
A simple query will be:
select * from table where item like '%name%'
To avoid the t and ted thing, use the substr()
function and get the string into a universal size and then put that string in where clause.
Upvotes: 0
Reputation: 1087
Possible Solution :
1.Simplest To implement -> use %operator
like %cats%
2.Use solr for fast implementation as optimal algo are implemented there.
Note: u can also cache your results in cache
Upvotes: 0
Reputation: 4223
The technique you are referring to is called stemming. Because of the great many influences on languages this is a difficult thing to handle on your own at the application level. If you do not want to deal with this you can let MySQL do the heavy lifting for you depending on what version of MySQL you are running. If you are on version 5.6.4 or later it is built into the full-text search mechanism for both MyISAM tables and InnoDB tables. In versions 5.5 through 5.6.3 it is built in for MyISAM but not InnoDB tables. For version 5.1 there is a plugin available from mnoGoSearch. Prior to 5.1 I think you need to handle it at the application level but I have not confirmed that.
These links might help get you started.
Be aware of the stopword list which is a list of very common and often short words that are ignored in your search text when the query is processed. There are settings to control the stopword list if it is preventing you from getting expected results. You will likely want to set the minimum word length to 2 or 3 (default is 4) and remove many of the words on the default list.
If you do want to handle stemming on your own or with PHP there is a detailed technical discussion of the Porter Stemming Algorithm by Martin Porter and there are at least two PHP implementations available, an older one in PHP4 by Jon Abernathy that may have some flaws and a newer one in PHP5 by Richard Heyes.
I am assuming that you are primarily concerned with English but I believe that there is some support for other languages as well.
As mentioned by rnmccall if you need more advanced search capabilities you may need to go with Sphinx or Apache Lucene.
Upvotes: 9
Reputation: 10012
I'd recommend using Lucene. It will also cause less stress on your db as you aren't running complex queries - just looking up an index. You can also run fuzzy searches with Lucene.
Upvotes: 1
Reputation: 6525
There is no mean of ed
or any thing you want to remove. Because you are searching a string from a paragraph you need to provide a particular keyword for search that.That keyword can be full string(word) or can be a sub-string(part of a word).
Example:-
You are in a black
hole.
Now you want to search black
by providing bla
as a search string.Then the query like :-
SELECT * FROM TABLE_NAME WHERE YOUR_FIELD_NAME LIKE '%BLA%'
Use this above query for make a exact match with your content.You can provide any sub-string from your para/passage that you want to search from.
Hope it will help you.
Upvotes: 0
Reputation: 1380
You can simply use
SELECT * FROM topics WHERE Title LIKE '%cat%'
in query to search topics with title cat and cats. You can use FullTextSearch if you want to search data from large text content. In this case you have to use MyISAM tables only. You can read the FullTextSearch Documentation here
Upvotes: 0
Reputation: 480
The strategy of removing suffixes described in the question is generally called stemming. If you are still interested in pursing that strategy, you should check out http://tartarus.org/~martin/PorterStemmer/ for the background of stemming. That page also has a PHP implementation of the Porter stemmer and links to more modern algorithms.
This stemming search approach is used by Sphinx, which is used for pydoc among other things.
The main benefit of the stemming approach is that it is straightforward and can be lightweight.
But, if you want more sophisticated search capabilities, you probably should use something like Apache Lucene.
Upvotes: 3