Reputation: 547
We wanted to build a search engine for online/offline products. We started surfing around the web and came to know about techniques like inverted index, TF/IDF and other generic search related algorithms. We used lucene which comes with all the above techniques inbuilt and our basic search platform is ready.
Later we realized that the generic search engine will returns any kind of output. Say if i search for "black shoes" the search string output will contain the output which has both black and shoes. So in output it is very possible to have a black shirt but with the lesser relevance.
So we thought product classification could be our rescue. We will classify our products based on the attributes it carries and then same way we will also parse the query string to mine what the user is looking for and match them directly.I am not sure if it is the way we should follow.
So i want to know what are the different techniques which are usually followed building a search engine for niche market??
Upvotes: 0
Views: 416
Reputation: 5003
Lucene is definitely one of the top API that you can use in order to build a Search Engine. I would advise you anyway to use Solr.
Solr uses Lucene under the hood but provides you with a lot of built in features and an amazing visual console.
About your problem, as very often happens, here is not matter of tool used, but how you use it. You can customize the Search behaviour with Lucene/Solr to obtain the desired results.
Anyway you have two options to adopt either separately or together:
1) create a set of contexts to choose from. For example Amazon Search let you choose among different contexts related to products (for instance "all departments", "beauty", "games" etc...). This trick will help you to narrow down the set of products;
2) use a SpanNearQuery or a PhraseQuery with slop 1 and boost them by proximity.
Obviously the previous options would help you if the documents in the index have been created with the structure that suits you the best.
Upvotes: 1