Reputation: 75
I'm using Lucene 5.1.0. After Analyzing and indexing a document, I would like to get a list of all the terms indexed that belong to this specific document.
{
File[] files = FILES_TO_INDEX_DIRECTORY.listFiles();
for (File file : files) {
Document document = new Document();
Reader reader = new FileReader(file);
document.add(new TextField("fieldname",reader));
iwriter.addDocument(document);
}
iwriter.close();
IndexReader indexReader = DirectoryReader.open(directory);
int maxDoc=indexReader.maxDoc();
for (int i=0; i < maxDoc; i++) {
Document doc=indexReader.document(i);
String[] terms = doc.getValues("fieldname");
}
}
the terms return null. Is there a way to get the saved terms per document?
Upvotes: 1
Views: 1027
Reputation: 75
Here is a sample code for the answer, using a TokenStream
TokenStream ts= analyzer.tokenStream("myfield", reader);
// The Analyzer class will construct the Tokenizer, TokenFilter(s), and CharFilter(s),
// and pass the resulting Reader to the Tokenizer.
OffsetAttribute offsetAtt = ts.addAttribute(OffsetAttribute.class);
CharTermAttribute charTermAttribute = ts.addAttribute(CharTermAttribute.class);
try {
ts.reset(); // Resets this stream to the beginning. (Required)
while (ts.incrementToken()) {
// Use AttributeSource.reflectAsString(boolean)
// for token stream debugging.
System.out.println("token: " + ts.reflectAsString(true));
String term = charTermAttribute.toString();
System.out.println(term);
}
ts.end(); // Perform end-of-stream operations, e.g. set the final offset.
} finally {
ts.close(); // Release resources associated with this stream.
}
Upvotes: 1