DarcyThomas
DarcyThomas

Reputation: 1288

Lucene Which would be better: many queries or massive OR query?

Problem I have a large list of keywords that I want to see if the are contained in a document or documents. (My users want to know when a document is published, if it has any of their saved keywords)

Now lets say there are over 1,000 key words.

(I am leaning towards the OR version but I am am worried I will hit some query length (performance) limit if I go too far)


Once I have enough data I will run some comparisons and report back.

Any hints between now and then would be great though.

Upvotes: 0

Views: 151

Answers (1)

Amadan
Amadan

Reputation: 198294

Single Giant Query Pro: You get ranking by the Lucene's scoring algorithm for all of the keywords.

Single Giant Query Con: You make Lucene use a huge amount of memory, as it needs to remember each subquery's result (or part of it) in order to give you that nice ranking that takes all keywords into account. The bigger the OR query, the more memory Lucene needs to do it, and the slower it does it.

I'd say, if at all possible for your purposes, break it up, since OR queries are The Devil (even though it's sometimes necessary to deal with them); but benchmark should be better than asking random people for opinions :P

Upvotes: 1

Related Questions