Michael
Michael

Reputation: 4351

Lucene slow retrieving documents by ID

I'm indexing data and I've noticed searches are taking a long time. I'm storing the content and file path for files.

document.add(new StringField(SearchField.FILE_ABSOLUTE_PATH.getName(), fileData.getFilePath().toString(), Field.Store.YES));
document.add(new TextField(SearchField.CONTENT.getName(), fileData.getContent(), Field.Store.YES));

After the search has completed it loops through the document ids and retrieves the file path stored in a field. This loop takes a very long time.

final TotalHitCountCollector collector = new TotalHitCountCollector();

searcher.search(query, collector);

final TopDocs docs = searcher.search(query, Math.max(1, collector.getTotalHits()));

final ScoreDoc[] hits = docs.scoreDocs;

final SearchResult[] result = new SearchResult[hits.length];

for(int i = 0; i < result.length; i++)
{
    final Document document = reader.document(hits[i].doc);
    result[i] = new SearchResult(Paths.get(document.get(SearchField.FILE_ABSOLUTE_PATH.getName())));
}

I'm wondering if it is taking a long time to retrieve documents because it has to load all of the file content from the disk even though I don't access the CONTENT field. If this is the issue then I possibly would have to change the way the data is stored.

What could be the cause of this?

Upvotes: 1

Views: 525

Answers (1)

femtoRgon
femtoRgon

Reputation: 33341

Yes, if the content field is long, this is probably slowing things down.

Two solutions are available here:

  1. If you never need to get the content from the index, you only need to search it, you should change this field to be unstored:

    document.add(new TextField(SearchField.CONTENT.getName(), fileData.getContent(), Field.Store.NO));
    

    That will reduce the size of result passed back from the index, and reduce the size of the index itself, as well.

  2. If you do need the content field to be stored, but just don't need the contents of it for this call, you can pass in a Set<String> containing the field names for the fields you need returned from IndexReader.document

    Set<String> getFields = Set.of({SearchField.FILE_ABSOLUTE_PATH.getName()});
    for(int i = 0; i < result.length; i++)
    {
        final Document document = reader.document(hits[i].doc, getFields);
        ...
    }
    

Upvotes: 1

Related Questions