Reputation: 4417
Is it possible to extract the list of all the terms in a Lucene index as a list of strings? I couldn't find that functionality in the doc. Thanks!
Upvotes: 10
Views: 11369
Reputation: 20099
In Lucene 4 (and 5):
Terms terms = SlowCompositeReaderWrapper.wrap(directoryReader).terms("field");
Edit:
This seems to be the 'correct' way now (Lucene 6 and up):
LuceneDictionary ld = new LuceneDictionary( indexReader, "field" );
BytesRefIterator iterator = ld.getWordsIterator();
BytesRef byteRef = null;
while ( ( byteRef = iterator.next() ) != null )
{
String term = byteRef.utf8ToString();
}
Upvotes: 17
Reputation: 188164
Lucene 3:
Java:
IndexReader indexReader = IndexReader.open(path);
TermEnum termEnum = indexReader.terms();
while (termEnum.next()) {
Term term = termEnum.term();
System.out.println(term.text());
}
termEnum.close();
indexReader.close();
Java (all terms for a specific field): How can I get the list of unique terms from a specific field in Lucene?
Python: Finding a single fields terms with Lucene (PyLucene)
Upvotes: 12