Jan K.
Jan K.

Reputation: 1608

Weighing search results

PHP / MySQL backend. I've got a database full of movies YouTube-style. Each video has a name and category. Videos and categories have a m:n relationship.

I'd like for my visitors to be able to search for videos and have them enter the search terms in one search field. I can't figure out how to return the best search results based on being category, occurrences in name.

What's the best way to go about something like this? Scoring? => Check for each search term whether it occurs in the name of the video; if so, award the video a point; check if the video is in categories that are also contained in the search query; if so, award it a point. Sort it by number points received? That sounds very expensive in terms of CPU usage.

Upvotes: 0

Views: 290

Answers (3)

thexplorer
thexplorer

Reputation: 76

First, use full text search. It can be either MySql full-text search or some kind of extrenal full-text search engine. I recommend sphinx. It is very fast, simple and even can be integrated with MuSQL using SphinxSE (so search indexes look loke tables in MySQL). However you have to install and configure it.

Second, think about splitting search results by search type. Any kind of full-text search will return list of matched items sorted by relevancy. You can search by all fields and get a single list. This is bad idea because hits by name and hits by category will be mixed. To solve this you can do multiple searches - search by name first, then search by category.

As a result you'll have two matching sets and you have a lot of options how to display this. Some ideas:

  1. merge 2 sets based on relevancy rate returned by the search engine. This looks like result of one single query but you know what each item is (name hit or category hit) so you can highlight this
  2. do the same marge as above but assign different weights to different sets, for eaxmple relevancy = 0.7*name_relevancy+0.3*category_relevancy. This will make search results more natural
  3. spit results into tabs/groups e.g. 'There are N titles and M categories matching your query)
  4. Use bands when displaying results. For each page (assuming you are splitting search results using paginator) dispslay N items from the first set and M items from the second set (you can dipslya sets one by one or shuffle items). If there is no enough items in one of sets then just get more items from another set, so there is always M+N items per page
  5. Any other way you can imagine

And you can use this method for any kind of fields - name, categroy, actor, director, etc. However the more fields you use the more search queries you have to execute

Upvotes: 1

Fantius
Fantius

Reputation: 3862

I don't think you can avoid looking at the title and category of every movie for each search. So the CPU usage for that is a given. If you are concerned about the CPU usage of the sort, it would be negligible in most cases, since you would only be sorting the items that have more than zero points.

Having said that, what you probably want is a system that is partially rule-based and partially point-based. For instance, if you have a title that is equal to the search term, it should come first, regardless of points. Architect your search such that you can easily add rules and tweak points as you see fit to yield the best results.

Edit: In the event of an exact title match, you can take advantage of a DB index and not search the whole table. Optionally, the same goes for category.

Upvotes: 1

fcingolani
fcingolani

Reputation: 576

Using Full-Text Search may help: http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html#function_match

You can test several columns at once against an expression.

Upvotes: 2

Related Questions