rockstarjindal
rockstarjindal

Reputation: 290

Lucene: Multiple words in a single term

Let's say I have a docs like

stringfield:123456
textfield:name website stackoverflow

and If I build a query in the following manner

StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_42);
QueryParser luceneQueryParser = new QueryParser(Version.LUCENE_42, "", analyzer);
Query luceneSearchQuery = luceneQueryParser.parse("textfield:\"name website\"");

it will return the doc as expected, but if I build my query using Lucene QueryAPI

PhraseQuery firstNameQuery  = new PhraseQuery();
    firstNameQuery.add(new Term("textfield","name website"));

it will not give me any result, i will have to tokenize "name website" and add each token in phrasequery.

Is there any default way in QueryAPI to tokenize as it does while parsing a String Query. Sure I can do that myself but reinvent the wheel if it's already implemented.

Upvotes: 0

Views: 2065

Answers (2)

Ben
Ben

Reputation: 391

When you:

luceneQueryParser.parse("textfield:\"name website\"");

Lucene will tokenize the string "name website", and get 2 terms.

When you:

new Term("textfield","name website")

Lucene will not tokenize the string "name website", instead use the whole as a term.

As the result what you said, when you index the document, the field textfield MUST be Indexed and Tokenized.

Upvotes: 0

femtoRgon
femtoRgon

Reputation: 33351

You are adding the entire query as a single term to your PhraseQuery. You are on the right track, but when tokenized, that will not be a single term, but rather two. That is, your index has the terms name, website, and stackoverflow, but your query only has one term, which matches none of those name website.

The correct way to use a PhraseQuery, is to add each term to the PhraseQuery separately.

PhraseQuery phrase = new PhraseQuery();
phrase.add(new Term("textfield", "name"));
phrase.add(new Term("textfield", "website"));

Upvotes: 2

Related Questions