user3860465
user3860465

Reputation: 55

Lucen.net returning only one hit

Hi i just started working on lucen.net today!!! After a lot of searches on net i found a approach to use it ..

I want to detect a word from a txt file, which is on my local hard drive (D). I am implementing like this

        string indexFileLocation = @"C:\Index";
        Directory dir = FSDirectory.Open(indexFileLocation);

        //create an analyzer to process the text
       Analyzer analyzer = new
        Lucene.Net.Analysis.Standard.StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29);
       IndexWriter indexWriter = new IndexWriter(dir, analyzer, true, IndexWriter.MaxFieldLength.UNLIMITED);

       Document doc = new Document();

       Field fldContent = new Field
            ("text", System.IO.File.ReadAllText(@"D:\SampleDataFile.txt"),
            Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.YES);
        doc.Add(fldContent);
        indexWriter.AddDocument(doc);
        indexWriter.Optimize();
   //   indexWriter.Commit();
        indexWriter.Dispose();
        string strIndexDir = @"C:\Index";
        Analyzer std = new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29);
        QueryParser parser = new QueryParser(Lucene.Net.Util.Version.LUCENE_29, "text", std);
        Query qry = parser.Parse("not");
        Directory Drct = FSDirectory.Open(new System.IO.DirectoryInfo(strIndexDir));
        Searcher Srch = new IndexSearcher(IndexReader.Open(Drct,true));
        TopScoreDocCollector cllctr = TopScoreDocCollector.Create(100, true);
        Srch.Search(qry, cllctr);
        ScoreDoc[] hits = cllctr.TopDocs().ScoreDocs;

I am creating indexes in C folder...My text file just contains the Macavity cat lyrics

But the results of hits count i am getting are all wrong i tried

             word      |      hits 
             -------------------
            Macavity   |      1
              not      |      0
              And      |      0
              eyes     |      0

every word i tried are there in lyrics but they are not getting as hits..except Macavity which is giving hits 1, and if i added more words 'Macavity' in same line or in next line there is no change in hits...always it is 1.

Please someone help me

Upvotes: 0

Views: 55

Answers (1)

femtoRgon
femtoRgon

Reputation: 33341

A hit is a matched document, not a match within a document. Since you have only one document, you will have a maximum of one hit.

Also, "not" and "and" are both default english stop words. They will be eliminated by StandardAnalyzer, so you can not search for them. It is usually not useful in practice to search for them, but if you really wish to be able to search for them, you can pass your own set of custom stop words into the StandardAnalyzer constructor.

Getting no matches on "eyes", though, seems odd. Perhaps something odd about what is being read from the file. I'd try debugging what System.IO.File.ReadAllText(@"D:\SampleDataFile.txt") looks like.

Upvotes: 1

Related Questions