Reputation: 32004

Multiple words query in Lucene

For example: There is a column "description" in a Lucene document. Let's say the content of "description" is [hello foo bar]. I want a query [hello f], then the document should be hit, [hello ff] or [hello b] should not be hit.

I use the programmatic way to create the Query, such as PrefixQuery, TermQuery were added to BooleanQuery, but they don't work as expected. StandardAnalyzer is used.

Test cases:

a): new PrefixQuery(new Term("description", "hello f")) -> 0 hit

b): PhraseQuery query = new PhraseQuery(); query.add( new Term("description", "hello f*") ) -> 0 hit

c): PhraseQuery query = new PhraseQuery(); query.add( new Term("description", "hello f") ) -> 0 hit

Any recommendations? Thanks!

Upvotes: 1

Answers (2)

Adam Dyga

Reputation: 8896

It doesn't work because you are passing multiple terms to one Term object . If you want all your search words to be prefix-found, you need to :

Tokenize the input string with your analyzer, it will split your search text "hello f" to "hello" and "f":

TokenStream tokenStream = analyzer.tokenStream(null, new StringReader(searchText)); CharTermAttribute termAttribute = tokenStream.getAttribute(CharTermAttribute.class);

List tokens = new ArrayList(); while (tokenStream.incrementToken()) { tokens.add(termAttribute.toString()); }
Put each token into Term object which in turn needs to be put in PrefixQuery and all PrefixQueries to BooleanQuery

EDIT: For example like this:

BooleanQuery booleanQuery = new BooleanQuery();

for(String token : tokens) {        
    booleanQuery.add(new PrefixQuery(new Term(fieldName, token)),  Occur.MUST);
}

Upvotes: 1

rrsk

Reputation: 124

tried Ngram or EdgeNgram while indexing??

http://lucene.apache.org/core/old_versioned_docs/versions/2_9_0/api/all/org/apache/lucene/analysis/ngram/NGramTokenizer.html

Upvotes: 0

Multiple words query in Lucene

Answers (2)

Related Questions