Reputation: 185
This is my code to search the Lucene index,
String DocPath=@"c:\Test1.txt";
if (File.Exists(DocPath))
{
StreamReader Reader = new StreamReader(DocPath);
StringBuilder Content = new StringBuilder();
Content.Append(Reader.ReadToEnd());
if (Content.ToString().Trim() != "")
{
FSDirectory Direc = FSDirectory.Open(new DirectoryInfo(IndexDir));
IndexReader Reader = IndexReader.Open(Direc, true);
IndexSearcher searcher = new IndexSearcher(Reader);
QueryParser parser = new QueryParser(Lucene.Net.Util.Version.LUCENE_30, "Content", new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29, new FileInfo(Application.StartupPath + Path.DirectorySeparatorChar + "noise.dat")));
BooleanQuery.MaxClauseCount = Convert.ToInt32(Content.ToString().Length);
Query query = parser.Parse(QueryParser.Escape(Content.ToString().ToLower()));
TopDocs docs = searcher.Search(query, Reader.maxDoc);
}
}
In this code I am opening one text file of 15MB and giving it to the index searcher. The search takes very long time and apparently throws an OutOfMemoryException
. It even takes time to parse the query. Index size is around 16K docs.
Upvotes: 0
Views: 456
Reputation: 5246
I suggest you change your approach. With the document, store an additional field that contains the hash of the file, like a MD5 hash for example.
Use your input to compute it's hash and issue a Query for that hash, and compare the matching documents with your input for equality.
It will be a lot more robust, and will probably be more performant too.
Upvotes: 2