Vaibhav Gumashta
Vaibhav Gumashta

Reputation: 994

Lucene TokenStream

I have a basic question on the lucene tokenizing process:

TokenStream tokenStream = analyzer.tokenStream(fieldName, reader);    
TermAttribute termAttribute = tokenStream.addAttribute(TermAttribute.class);

What is a termAttribute used for and what does tokenStream.addAttribute(TermAttribute.class) do?

Thanks!

Upvotes: 2

Views: 1210

Answers (1)

naresh
naresh

Reputation: 2113

TermAttribute contains the text of token. addAttribute(TermAttribute.class) will return an instance of TermAttribute (will create if there isn't one already).

Say, you are also interested in position-increment information of a token, then you will also say the following:

PositionIncrementAttribute posIncrAtt = addAttribute(PositionIncrementAttribute.class);

Using the instances of TermAttribute and PositionIncrementAttribute, you can now access/change token text and position increment information in the following way:

termAttribute.buffer()
posIncrAtt.getPositionIncrement()
posIncrAtt.setPositionIncrement()

Refer to http://lucene.apache.org/core/3_6_0/api/core/org/apache/lucene/analysis/package-summary.html for more details

Upvotes: 3

Related Questions