lucene spanquery matching same word

Question

I have a system using lucene to search documents according to queries given by the user. When the user's query contains more than one word, I create a SpanNearQuery with each word as term, and the last term is a wrapper for prefix query (and span=0). For example if user input is "new y" this should match both "new year" and "new york"

This works fine, however if the query has the same word twice, e.g "bora bora", even documents with one appearence of "bora" are matched.

How can I match only "bora bora*"?

code :

String[] words = querystr.split(" ");           
SpanQuery[] clauses = new SpanQuery[words.length];
for (int i = 0; i < words.length; i++) {                
   if (allWordsPrefix || i == words.length - 1)
   {
        PrefixQuery pq = new PrefixQuery(new Term(LOWER_VALUE, words[i])); 
        clausesWildCard[i] = new SpanMultiTermQueryWrapper(  
   }
   else
   {
        Term clause = new Term(LOWER_VALUE, words[i]); 
        clausesWildCard[i] = new SpanTermQuery(clause);
   }                
}
SpanQuery allTheWords = new SpanNearQuery(clausesWildCard, 0, false);

EDIT: I have found this seems to be a known issue https://issues.apache.org/jira/browse/LUCENE-5932 https://issues.apache.org/jira/browse/LUCENE-3120

but i don't understand whether this is solved or has a workaround.

Upgraded to lucene 5.0.0 but it keeps hapenning...

lucene spanquery matching same word

Answers (1)

Related Questions