Reputation: 11200
I'm trying to produce something similar to what QueryParser in lucene does, but without the parser, i.e. run a string through StandardAnalyzer, tokenize this and use TermQuery:s in a BooleanQuery to produce a query. My problem is that I only get Token:s from StandardAnalyzer, and not Term:s. I can convert a Token to a term by just extracting the string from it with Token.term(), but this is 2.4.x-only and it seems backwards, because I need to add the field a second time. What is the proper way of producing a TermQuery with StandardAnalyzer?
I'm using pylucene, but I guess the answer is the same for Java etc. Here is the code I've come up with:
from lucene import *
def term_match(self, phrase):
query = BooleanQuery()
sa = StandardAnalyzer()
for token in sa.tokenStream("contents", StringReader(phrase)):
term_query = TermQuery(Term("contents", token.term())
query.add(term_query), BooleanClause.Occur.SHOULD)
Upvotes: 4
Views: 2542
Reputation: 3485
I've come across the same problem, and, using Lucene 2.9 API and Java, my code snippet looks like this:
final TokenStream tokenStream = new StandardAnalyzer(Version.LUCENE_29)
.tokenStream( fieldName , new StringReader( value ) );
final List< String > result = new ArrayList< String >();
try {
while ( tokenStream.incrementToken() ) {
final TermAttribute term = ( TermAttribute ) tokenStream.getAttribute( TermAttribute.class );
result.add( term.term() );
}
Upvotes: 0
Reputation: 281515
The established way to get the token text is with token.termText()
- that API's been there forever.
And yes, you'll need to specify a field name to both the Analyzer
and the Term
; I think that's considered normal. 8-)
Upvotes: 2