Reputation: 346
I was wondering if it was possible to index and store a number of large files each over 4GB in size. I had no problems doing so and searching the documents with one exception - I was not able to retrieve and highlight the content of matched documents. The code below allows me to create searchable index without running out of memory.
var doc = new Document();
doc.Add(new Field(string, TextReader));
Changing it to the line below would eventually result in out of memory exception.
new Field(string, TextReader.ReadToEnd(), Field.Store.YES, Field.Index.ANALYZED)
I was able to index and store 28 files 150MB each which allowed me to search and retrieve matched text. However, query performance was unacceptable and after two or three searches the out of memory exception would be thrown. I understand the reason for the exception and why it occurs. The question for the community is am I missing something? Is there a functionality within Lucene API that addresses my problem? I already have a solution that splits the files and does what I would like to achieve without having to scale the application horizontally across multiple servers and create file chunks.
Thanks in advance!
Upvotes: 0
Views: 695
Reputation: 26733
Do you really need to store those files in Lucene index? This just adds overhead and slows down everything.
Simply store these files in the file system and have a path reference in Lucene document (e.g. /path/to/file
).
Indexing of contents should be fine though, providing you have adequate amount of RAM available.
Upvotes: 1