Reputation: 667
I am using Galago retrieval toolkit (a part of the Lemur project) and I need to have a list of all vocabulary terms in the collection (all unique terms). Actually I need a List <String>
or Set <String>
I really appreciate to let me know how can I obtain such a list?
Upvotes: 2
Views: 248
Reputation: 667
The `DumpKeysFn' class seems to give all the keys (unique terms) of the collection. The code should be like this:
public static Set <String> getAllVocabularyTerms (String fileName) throws IOException{
Set <String> result = new HashSet<> ();
IndexPartReader reader = DiskIndex.openIndexPart(fileName);
if (reader.getManifest().get("emptyIndexFile", false)) {
// do something!
}
KeyIterator iterator = reader.getIterator();
while (!iterator.isDone()) {
result.add(iterator.getKeyString());
iterator.nextKey();
}
reader.close();
return result;
}
Upvotes: 1