Reputation: 1353
I have a Lucene Index .FDT file that is about 5GB. I add records to it very often(1000 records per day) and none will be deleted. It has 5 fields and only one of them is text content of html page. I also run a query parser on this index to look for some keywords. Even though the index is optimized every time I insert, it is taking almost a minute to find the keyword in the text content of html page. Has anyone gone through this problem and any suggestions on how to resolve this?
These are the following steps that I do in my code 1. Using SQLData Reader, get contents of table which contains title,EmployeeID,headline(short description of employee department), Date(date this employee was added to the table or his info changed), data (html version of employee details) 2. For each record in table do the following
string body= strip text from html from webpage or data;
var doc = new Document();
doc.Add(new Field("title", staticname, Field.Store.YES, Field.Index.ANALYZED)); //title is always "Employee info"
doc.Add(new Field("Employeeid", keyid.Replace(",", " "), Field.Store.YES, Field.Index.ANALYZED));
doc.Add(new Field("headline", head, Field.Store.YES, Field.Index.ANALYZED));
doc.Add(new Field("date", DateTools.DateToString(date, DateTools.Resolution.SECOND), Field.Store.YES, Field.Index.NOT_ANALYZED));
if (data == null)
data = "";
else if (data.Length > 500)
{
data = data.Substring(0, 500);
}
doc.Add(new Field("body", data, Field.Store.YES, Field.Index.ANALYZED));
indexWriter.AddDocument(doc);
indexWriter.Optimize();
indexWriter.Commit();
indexWriter.Dispose();
----In the search program
string searchword="disability";
QueryParser queryParser = new QueryParser(VERSION, "body", analyzer);
string word = "+Employeeid:" + Employeeid+ " +body:" + searchword;
Query query = queryParser.Parse(word);
try
{
IndexReader reader = IndexReader.Open(luceneIndexDirectory, true);
Searcher indexSearch = new IndexSearcher(reader);
TopDocs hits = indexSearch.Search(query, 1);
if (hits.TotalHits > 0)
{
float score = hits.ScoreDocs[0].Score;
if (score > MINSCORE)
{
results.Add(result); //it is a list that has EmployeeID,searchwordID,searchword,score
}
}
indexSearch.Dispose();
reader.Dispose();
indexWriter.Dispose();
}
Any input is appreciated.
Thanks M
Upvotes: 0
Views: 248
Reputation: 21
Do not store the body and headline field to your index.
doc.Add(new Field("headline", head, Field.Store.No, Field.Index.ANALYZED));
doc.Add(new Field("body", head, Field.Store.No, Field.Index.ANALYZED));
It is useless for search.
Upvotes: 2