Sap
Sap

Reputation: 5321

Searching problem with Lucene

I have a Lucene index of around 22,000 lucene documents but I have been facing a unique problem with it while creating a search program.

Each document has a Title, description and long_description fields, these fields have data related to different diseases and their symptoms. Now when I search for a phrase like following "infection of the small intestine" I am expecting "Cholera" to be the first result(By the way I am using MultiFieldQueryParser with StandardAnalyzer.)

The reason I expect Cholera to be the first one is because it has exact phrase "infection of the small intestine" in the long description fields. But instead of this result coming on top it comes way at the bottom because there are plenty of other documents which mentions the term "infection" in the title field(which is substantially smaller in length than description field). This can be easily seen in the screenshot bellow. enter image description here

So just because "cholera" does not have the most pertinent information in the "title" field it comes way at the bottom. I saw following thread where the use of "~3" is suggested, but is that what I should do for all my queries from behind the scene? Isn't there a better way of doing it?

Searching phrases in Lucene

Upvotes: 0

Views: 173

Answers (2)

Bohemian
Bohemian

Reputation: 425288

Make your query boost the hits in title high, description medium and long_desc low, like this:

title:intestine^100 description:intestine^10 long_description:intestine^1

This example gives title matches score "+100", description matches score "+10" and long_description matches score "+1". Higher total boost scores are sorted first. You can pick any numbers you like for the boost values.

Upvotes: 1

Related Questions