user2726975
user2726975

Reputation: 1353

Lucene Index grown too large

I have a Lucene Index .FDT file that is about 5GB. I add records to it very often(1000 records per day) and none will be deleted. It has 5 fields and only one of them is text content of html page. I also run a query parser on this index to look for some keywords. Even though the index is optimized every time I insert, it is taking almost a minute to find the keyword in the text content of html page. Has anyone gone through this problem and any suggestions on how to resolve this?

These are the following steps that I do in my code 1. Using SQLData Reader, get contents of table which contains title,EmployeeID,headline(short description of employee department), Date(date this employee was added to the table or his info changed), data (html version of employee details) 2. For each record in table do the following

string body= strip text from html from webpage or data;
 var doc = new Document();
 doc.Add(new Field("title", staticname, Field.Store.YES, Field.Index.ANALYZED)); //title is always "Employee info"
 doc.Add(new Field("Employeeid", keyid.Replace(",", " "), Field.Store.YES, Field.Index.ANALYZED));
 doc.Add(new Field("headline", head, Field.Store.YES, Field.Index.ANALYZED)); 
 doc.Add(new Field("date", DateTools.DateToString(date, DateTools.Resolution.SECOND), Field.Store.YES, Field.Index.NOT_ANALYZED));
             if (data == null)
                  data = "";
             else if (data.Length > 500)
             {
                   data = data.Substring(0, 500);
             }
             doc.Add(new Field("body", data, Field.Store.YES, Field.Index.ANALYZED));
             indexWriter.AddDocument(doc);
             indexWriter.Optimize();
             indexWriter.Commit();
             indexWriter.Dispose();

----In the search program

string searchword="disability";
QueryParser queryParser = new QueryParser(VERSION, "body", analyzer);
string word = "+Employeeid:" + Employeeid+ " +body:" + searchword;
Query query = queryParser.Parse(word);

try
 {
           IndexReader reader = IndexReader.Open(luceneIndexDirectory, true);
          Searcher indexSearch = new IndexSearcher(reader);
           TopDocs hits = indexSearch.Search(query, 1);

            if (hits.TotalHits > 0)
            {
             float score = hits.ScoreDocs[0].Score;
             if (score > MINSCORE)
            {
              results.Add(result);  //it is a list that has EmployeeID,searchwordID,searchword,score
             }
           }

           indexSearch.Dispose();
           reader.Dispose();
           indexWriter.Dispose();
         }

Any input is appreciated.

Thanks M

Upvotes: 0

Views: 248

Answers (1)

WingFeng
WingFeng

Reputation: 21

Do not store the body and headline field to your index.

 doc.Add(new Field("headline", head, Field.Store.No, Field.Index.ANALYZED)); 
 doc.Add(new Field("body", head, Field.Store.No, Field.Index.ANALYZED)); 

It is useless for search.

Upvotes: 2

Related Questions