boomz
boomz

Reputation: 667

Get vocabulary list in Galago

I am using Galago retrieval toolkit (a part of the Lemur project) and I need to have a list of all vocabulary terms in the collection (all unique terms). Actually I need a List <String> or Set <String> I really appreciate to let me know how can I obtain such a list?

Upvotes: 2

Views: 248

Answers (1)

boomz
boomz

Reputation: 667

The `DumpKeysFn' class seems to give all the keys (unique terms) of the collection. The code should be like this:

public static Set <String> getAllVocabularyTerms (String fileName) throws IOException{
    Set <String> result = new HashSet<> ();
    IndexPartReader reader = DiskIndex.openIndexPart(fileName);
    if (reader.getManifest().get("emptyIndexFile", false)) {
        // do something!
    }

    KeyIterator iterator = reader.getIterator();
    while (!iterator.isDone()) {
      result.add(iterator.getKeyString());
      iterator.nextKey();
    }
    reader.close();
    return result;
}

Upvotes: 1

Related Questions