user1051536
user1051536

Reputation: 185

Lucene IndexSearcher causing OutOfMemoryException

This is my code to search the Lucene index,

String DocPath=@"c:\Test1.txt";
if (File.Exists(DocPath))
{
    StreamReader Reader = new StreamReader(DocPath);

    StringBuilder Content = new StringBuilder();
    Content.Append(Reader.ReadToEnd());

    if (Content.ToString().Trim() != "")
    {
        FSDirectory Direc = FSDirectory.Open(new DirectoryInfo(IndexDir));
        IndexReader Reader = IndexReader.Open(Direc, true);
        IndexSearcher searcher = new IndexSearcher(Reader);
        QueryParser parser = new QueryParser(Lucene.Net.Util.Version.LUCENE_30, "Content", new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29, new FileInfo(Application.StartupPath + Path.DirectorySeparatorChar + "noise.dat")));
        BooleanQuery.MaxClauseCount = Convert.ToInt32(Content.ToString().Length);
        Query query = parser.Parse(QueryParser.Escape(Content.ToString().ToLower()));
        TopDocs docs = searcher.Search(query, Reader.maxDoc);
    }
}  

In this code I am opening one text file of 15MB and giving it to the index searcher. The search takes very long time and apparently throws an OutOfMemoryException. It even takes time to parse the query. Index size is around 16K docs.

Upvotes: 0

Views: 456

Answers (1)

Jf Beaulac
Jf Beaulac

Reputation: 5246

I suggest you change your approach. With the document, store an additional field that contains the hash of the file, like a MD5 hash for example.

Use your input to compute it's hash and issue a Query for that hash, and compare the matching documents with your input for equality.

It will be a lot more robust, and will probably be more performant too.

Upvotes: 2

Related Questions