Jordan Reiter
Jordan Reiter

Reputation: 21002

How to handle searches for very common keywords

I want to be able to return useful records if a user searches for a keyword that is very, very common in a solr index. For example education.

In this case, close to 99% of the records would have that word in it. So searches for this word or similar take a long time.

This is for solr on ColdFusion but I'm open to solutions which are isolated to just solr.

Right now I'm thinking of coming up with a list of stopwords and preventing those searches from taking place altogether.

Upvotes: 3

Views: 461

Answers (2)

Kirk Broadhurst
Kirk Broadhurst

Reputation: 28718

If the user searches on just one term that is exceedingly common then you need to limit your results and advise the user that there were too many matches.

In the more general case, you want to perform a two-pass (at least) approach. Take your search terms and perform a lookup to determine their 'common-ness'. You want to filter based on least common terms first, and more common terms last.

For example, user searches serendipitous education. You identify that you have 11 matches for serendipitous, and 900000 matches for education. Thus you apply the serendipitous filter first, resulting in 11 matches. Then apply the education filter, resulting in 7 matches.

The key to fast searching is indexing and precomputed statistics. If you have statistics like this on hand you can dynamic create an optimised approach.

Upvotes: 0

David Faber
David Faber

Reputation: 12485

If searches are taking a long time, it could be because you are not limiting the number of results that are returned. The <cfsearch> tag has a maxrows attribute, as well as a startrow attribute, that you could use to limit or paginate the data. Alternately, you could call Solr's web service directly through a <cfhttp> call:

<cfhttp url="http://localhost:8983/solr/<collection_name>/select/?q=<searchterm>&fl=*,score&rows=100&wt=json" />

Solr will return 10 rows by default; you can change this with the rows parameter. You can use the start parameter as well (note that Solr starts counting with 0 instead of 1). I believe this solution is more flexible, especially if you're using CF 9, as it allows you to paginate while sorting on a field other than score.

You can find more detail here: http://www.thefaberfamily.org/search-smith/coldfusion-solr-tutorial/

Upvotes: 2

Related Questions