Reputation: 2437
I've indexed some documents in the index module. Intuitively, Lucene set IDs
for any indexed document. These IDs
may not have a specific order though. Concretely, the first doc ID is set to 127
, the second one is set to 133
and so on...
In the search module, I have the document (which I want to process), But I'm trying to get these already-set docIDs
(that was set by Lucene in index time) See the code below:
private long calculateProbabilityOfDocument(String topic, Document doc){
Terms termVector = iReader.getTermVector(DOCID, FIELD);
}
EDIT:
I think Lucene may not let me access the internal IDs. Is there any other approach?
Thanks in advance!
Upvotes: 3
Views: 5034
Reputation: 2437
I finally could end up finding the solution.
I found out that lucene does not allow access to its internal document IDs. However, we can iterate through the documents and get their TermVector
. Seems that it's the only possible way to get term vectors. I'm using the script below:
QueryParser parser = new QueryParser("Body", new EnglishAnalyzer());
Query query = parser.parse(topic);
TopDocs hits = iSearcher.search(query, 1000);
for (int i=0; i<hits.scoreDocs.length; i++){
Terms termVector = iSearcher.getIndexReader().getTermVector(hits.scoreDocs[i].doc, "Body");
Document doc = iSearcher.doc(hits.scoreDocs[i].doc);
documentsList.put(doc, termVector);
}
Upvotes: 5