Reputation: 729
I'm looking for a search engine that I can point to a column in my database that supports advanced functions like spelling correction and "close to" results.
Right now I'm just using
SELECT <column> from <table> where <colname> LIKE %<searchterm>%
and I'm missing some results particularly when users misspell items.
I've written some code to fix misspellings by running it through a spellchecker but thought there may be a better out-of-the box option to use. Google turns up lots of options for indexing and searching the entire site where I really just need to index and search this one table column.
Upvotes: 3
Views: 180
Reputation: 11933
Apache Solr is a great Search Engine that provides (1) N-Gram Indexing (search for not just complete strings but also for partial substrings, this helps greatly in getting similar results) (2) Provides an out of box Spell Corrector based on distance metric/edit distance (which will help you in getting a "did you mean chicago" when the user types in chicaog) (3) It provides you with a Fuzzy Search option out of box (Fuzzy Searches helps you in getting close matches for your query, for an example if a user types in GA-123 he would obtain VMDEO-123 as a result) (4) Solr also provides you with "More Like This" component which would help you out like the above options.
Solr (based on Lucene Search Library) is open source and is slowly rising to become the de-facto in the Search (Vertical) Industry and is excellent for database searches (As you spoke about indexing a database column, which is a cakewalk for Solr). Lucene and Solr are used by many Fortune 500 companies as well as Internet Giants.
Sphinx Search Engine is also great (I love it too as it has very low foot print for everything & is C++ based) but to put it simply Solr is much more popular.
Now Python support and API's are available for both. However Sphinx is an exe and Solr is an HTTP. So for Solr you simply have to call the Solr URL from your python program which would return results that you can send to your front end for rendering, as simple as that)
So far so good. Coming to your question:
First you should ask yourself that whether do you really require a Search Engine? Search Engines are good for all use cases mentioned above but are really made for searching across huge amounts of full text data or million's of rows of tabular data. The Algorithms like Did you Mean, Similar Records, Spell Correctors etc. can be written on top. Before zero-ing on Solr please also search Google for (1) Peter Norvig Spell Corrector & (2) N-Gram Indexing. Possibility is that just by writing few lines of code you may get really the stuff that you were looking out for.
I leave it up to you to decide :)
Upvotes: 3
Reputation: 2284
Before going down the Solr/Sphinx route for full text indexing - which adds complexity and their own overhead - you can try the built-in full text engine in PostgreSQL if you are using that database. It's easy to setup and performs better than LIKE
queries.
Check out https://github.com/hcarvalhoalves/django-tsearch2
Upvotes: 0
Reputation: 464
I would suggest looking into open source technologies like Sphynx Search.
Upvotes: 1