What are the principal differences of sphinx index from mysql index that allow faster searches and aggregation

Question

At the moment I am trying to build a search engine similar to this one http://www.indeed.com/jobs?q=sales&l= (my largest concern is the search filters on the left which are group-by-s of search results by different parameters).

Currently the database we are using is MySQL. I had done some initial research and people kept telling me that my problems arise from not using mysql indexes right. I've spent months on in-depth studies of mysql indexes, caching, etc only to find out that there is simply no way to solve this problem with what mysql offers. B-tree is a good thing but it has its limitations and there is no way to set the kind of index that would allow all types of searches (full-text, range, etc) on multiple columns at the same time (and then also group-by everything in the end multiple times).

After more googling and talking, I've discovered a thing called Sphinx that allegedly is capable of solving these kinds of tasks. However I can't seem to find a nice book or study that would explain and illustrate how exactly Sphinx achieves this. If you could explain me how this works or at least point me in the direction of a good book, that would be awesome! Basically I want to understand the technology behind Sphinx to be 100% sure that it can solve the problems I am facing.

Thanks!

barryhunter · Accepted Answer

At its most basic, Sphinx implements an Inverted Index http://en.wikipedia.org/wiki/Inverted_index

This works well for full-text searching. Incidently pretty sure Mysql's "FULL TEXT" indexes use an inverted index too.

Sphinx is generally better for searching than mysql - because it's a tool designed for the task, rather than a 'bolt on'. It has lots more configuration options. Sphinx also has great empersize on performance, and goes to great lengths to implement that. Eg attributes are held in memory, for fast index lookups.

It also has features like 'multi-queries' which uses a number of optimizations, to run multiple queries on the same dataset - very useful for building these types of interfaces.

... so sphinx will be more complicated to setup, but ulimately the results hopefully will make it worth the effort!

I dont know any document explaining the benefit of sphinx as such. There is a chapter in "high performance mysql" that might be worth a read, but doesnt go into the "how" sphinx works.

THere are also a number of talks, eg http://www.slideshare.net/freelancing_god/sphinx-beyond-the-basics more: https://www.google.co.uk/search?q=sphinx+talk They could be a good way to get an overview of power of sphinx.

There is also this really technical document: http://sphinxsearch.googlecode.com/svn/trunk/doc/internals-index-format.txt

(by the way - the 'things on the left' are called facets. Sphinx does facetting well. having the technical term might help your research. But sphinx itself generally calls them 'group by' or clustering)

What are the principal differences of sphinx index from mysql index that allow faster searches and aggregation

Answers (1)

Related Questions