inverted_index
inverted_index

Reputation: 2437

How to get internal doc id set by lucene

I've indexed some documents in the index module. Intuitively, Lucene set IDs for any indexed document. These IDs may not have a specific order though. Concretely, the first doc ID is set to 127, the second one is set to 133 and so on...

In the search module, I have the document (which I want to process), But I'm trying to get these already-set docIDs (that was set by Lucene in index time) See the code below:

private long calculateProbabilityOfDocument(String topic, Document doc){   

  Terms termVector = iReader.getTermVector(DOCID, FIELD);
}

EDIT:

I think Lucene may not let me access the internal IDs. Is there any other approach?

Thanks in advance!

Upvotes: 3

Views: 5034

Answers (1)

inverted_index
inverted_index

Reputation: 2437

I finally could end up finding the solution.

I found out that lucene does not allow access to its internal document IDs. However, we can iterate through the documents and get their TermVector. Seems that it's the only possible way to get term vectors. I'm using the script below:

QueryParser parser = new QueryParser("Body", new EnglishAnalyzer());
Query query = parser.parse(topic);
TopDocs hits = iSearcher.search(query, 1000);
for (int i=0; i<hits.scoreDocs.length; i++){
     Terms termVector = iSearcher.getIndexReader().getTermVector(hits.scoreDocs[i].doc, "Body");
     Document doc = iSearcher.doc(hits.scoreDocs[i].doc);
     documentsList.put(doc, termVector);
}

Upvotes: 5

Related Questions