Reputation: 628
I started develop facet search in my project. As example how it should work I took asos.com. As engine I use mysql for no fulltext search and sphinx for fulltext search. For faster search I denormalized data in db and created special table for this.
CREATE TABLE `item_search` (
`id` int(10) unsigned NOT NULL DEFAULT '0',
`item_id` int(10) unsigned NOT NULL DEFAULT '0',
`tab` tinyint(3) unsigned NOT NULL,
`designer_id` smallint(5) unsigned NOT NULL,
`name` varchar(50) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
`price` decimal(18,2) NOT NULL,
`arm` tinyint(3) unsigned NOT NULL,
`bridge` tinyint(3) unsigned NOT NULL,
`lens_width` tinyint(3) unsigned NOT NULL,
`rating` int(11) unsigned DEFAULT '0',
`pageviews` int(11) unsigned DEFAULT '0',
`category_id` int(4) unsigned NOT NULL,
KEY `index2` (`item_id`,`tab`,`price`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
/*!50100 PARTITION BY LIST (tab)
(PARTITION pg VALUES IN (1) ENGINE = InnoDB,
PARTITION rx VALUES IN (2) ENGINE = InnoDB,
PARTITION sun VALUES IN (3) ENGINE = InnoDB,
PARTITION acc VALUES IN (4) ENGINE = InnoDB)
How to this looks in code.
if (Arr::get($args, 'query')){
// Get item ids that was found by query
$args['item_ids'] = Items::findByFullText($args);
}
...//Some code here
if ($item_ids = Arr::get($args, 'item_ids')){
$where[] = "item_id IN (".implode(',', $item_ids).")";
}
...// Build and execute query and return item_ids
I can send multiple categories in multiple groups to search.
if ($categories = Arr::get($args, 'categories')){
// group categories by categories group
$categories = Category::groupCategories($categories);
foreach ($categories as $group_id => $category_group){
foreach ($category_group as $category_id){
$item_ids[$group_id][] = Category::getItemIdsByCategory($category_id);
}
// Sum all item ids by this group
if (sizeof($item_ids[$group_id]) == 1){
$item_ids_by_group[$group_id] = $item_ids[$group_id][0];
}else{
$item_ids_by_group[$group_id] = call_user_func_array('array_merge', $item_ids[$group_id]);
}
}
// Return found item ids by categories
(sizeof($item_ids_by_group) == 1)
? $item_ids_by_group[$group_id]
: call_user_func_array('array_intersect', $item_ids_by_group);
}
This is works pretty fast if enable cache for results, Only one problem that have here that no have possibility fetch categories like asos.do. What logic should be for returning categories ids for category groups? Many thanks for any response.
Upvotes: 0
Views: 177
Reputation: 792
Mysql uses B-TREEs to store indexes, Sorl on Lucene (which is a powerfull Java library especially for searching stuff) uses Term Dictionaries and Binary Search for those terms. Indexing a billion documents in Solr, takes under 1 second, beat that with any sql solution.
Here is a nice tutorial: http://searchhub.org/2009/09/02/faceted-search-with-solr/
I have testeted with 500 milion documents and still got under 1 second results, these exact results can be proven by similar experiences on a simple google search.
Also, I might add, that since the overhead of using Mysql for faceting, it's not in use any more.
Basically the framework that is in used today as a best practice is write in Memcache/Solr and your primary Sql solution and then read only from Solr/Memcache.
Here's an eBay replica I used to work for and used Solr as it's faceting and search engine: http://www.okazii.ro/
Also, if you insist on sphinx, try: http://www.dreamstime.com
Also, check out this comparisson:
Choosing a stand-alone full-text search server: Sphinx or SOLR?
As you can see: Solr comes with facet support out of the box. Faceting in Sphinx takes more work.
Upvotes: 1