Mala
Mala

Reputation: 14823

Efficient MySQL text search

I have a forum written in PHP using MySQL, and I'd like to make forum search available. It will allow users to search for particular strings, as well as filter on metadata like post date and subject and so on. The metadata can be efficiently searched because most of these fields are indexed, but I think that the primary use-case is of course going to be normal text search, and without making use of metadata filters which could trim the results.

After some testing I have found that, contrary to most people's setups, SQL_CALC_FOUND_ROWS is significantly faster (approx 1.5x) than doing the query twice in order to get the number of results, so the best query I have is:

SQL_CALC_FOUND_ROWS * from blahblah where content like '%term%' limit whatever whatever;

Unsurprisingly, this is really slow because it has to text-match every single forum post in the database. Is there anything I can do to improve on this? Would putting an index on the content (TEXT) field even help when using the LIKE operator? How does one normally do this?

Upvotes: 2

Views: 1405

Answers (1)

GolezTrol
GolezTrol

Reputation: 116110

An index on the column will help, even using the like operator, but not when you have a wildcard at the start too. So for term% an index will be beneficial, but for %term% it will not.

But instead, you may have a look at FULLTEXT indexes. If you add such an index to a TEXT field, MySQL indexes separate words and allows you to do all kinds of search engine-like searches. To search you use MATCH() ... AGAINST instead of LIKE.

See the docs: https://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html

Disclaimer: I suggest you read the documentation carefully after the first experimentation. FULLTEXT indexes are powerful but still have their limits.

FULLTEXT indexes take up quite some space, and the way they are built up depends on core settings in MySQL, so they may behave differently between a local setup and a server.

For instance, they index complete words but leave out very short words and certain stop-words. Also, because they index words, you won't be able to search parts of words. Looking for 'term' will not find 'determine' out of the box.

So make sure those indexes can do what you want, and if you have a shared hosting, make sure they can be configured and tuned the way you like before you do a large implementation.

Upvotes: 6

Related Questions