Reputation: 994
I have a basic question on the lucene tokenizing process:
TokenStream tokenStream = analyzer.tokenStream(fieldName, reader);
TermAttribute termAttribute = tokenStream.addAttribute(TermAttribute.class);
What is a termAttribute used for and what does tokenStream.addAttribute(TermAttribute.class) do?
Thanks!
Upvotes: 2
Views: 1210
Reputation: 2113
TermAttribute contains the text of token. addAttribute(TermAttribute.class) will return an instance of TermAttribute (will create if there isn't one already).
Say, you are also interested in position-increment information of a token, then you will also say the following:
PositionIncrementAttribute posIncrAtt = addAttribute(PositionIncrementAttribute.class);
Using the instances of TermAttribute and PositionIncrementAttribute, you can now access/change token text and position increment information in the following way:
termAttribute.buffer()
posIncrAtt.getPositionIncrement()
posIncrAtt.setPositionIncrement()
Refer to http://lucene.apache.org/core/3_6_0/api/core/org/apache/lucene/analysis/package-summary.html for more details
Upvotes: 3