user1662409
user1662409

Reputation: 157

Is it possible to find exact matches only when searching for a phrase in Lucene.net?

I know similar questions have already been asked, but I cannot find any answers that suit what I am looking for.

Basically, I want to search for phrases and only return matches which have that exact phrase only and not partial matches.

e.g. A document has "This is a phrase" should not return hits if I search for "This is".

Taking this example: Exact Phrase search using Lucene?

"foo bar" should not return a hit because it is only a partial match. A full match, which is what I'm looking for, would be "foo bar baz".

Here is the code, credit goes to WhiteFang34 for posting this in the above link (I have simply converted to c#):

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Lucene.Net.QueryParsers;
using Lucene.Net.Search;
using Lucene.Net.Documents;
using Lucene.Net.Analysis.Standard;
using Lucene.Net.Analysis;
using Lucene.Net.Store;
using Lucene.Net.Index;

namespace LuceneStatic
{
    public static class LuceneStatic
    {
        public static void LucenePhraseQuery()
        {
            // setup Lucene to use an in-memory index
            Lucene.Net.Store.Directory directory = new RAMDirectory();
            Analyzer analyzer = new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29);
            var mlf = Lucene.Net.Index.IndexWriter.MaxFieldLength.UNLIMITED;
            IndexWriter writer = new IndexWriter(directory, analyzer, true, mlf);

            // index a few documents
            writer.AddDocument(createDocument("1", "foo bar baz"));
            writer.AddDocument(createDocument("2", "red green blue"));
            writer.AddDocument(createDocument("3", "test foo bar test"));
            writer.Close();

            // search for documents that have "foo bar" in them
            String sentence = "foo bar";
            IndexSearcher searcher = new IndexSearcher(directory, true);
            PhraseQuery query = new PhraseQuery();
            string[] words = sentence.Split(' ');
            foreach (var word in words)
            {
                query.Add(new Term("contents", word));
            }

            // display search results
            List<string> results = new List<string>();
            TopDocs topDocs = searcher.Search(query, 10);
            foreach (ScoreDoc scoreDoc in topDocs.ScoreDocs)
            {
                Document doc = searcher.Doc(scoreDoc.doc);
                results.Add(doc.Get("contents"));
            }
        }

        private static Document createDocument(string id, string content)
        {
            Document doc = new Document();
            doc.Add(new Field("id", id, Field.Store.YES, Field.Index.NOT_ANALYZED));
            doc.Add(new Field("contents", content, Field.Store.YES, Field.Index.ANALYZED,
                    Field.TermVector.WITH_POSITIONS_OFFSETS));
            return doc;
        }
    }
}

I have played around with this using difference analyzers and different approaches but I cannot get the required results. I need a match for the full phrase "foo bar baz", but "foo bar" should not return any hits.

Upvotes: 4

Views: 6518

Answers (1)

Jf Beaulac
Jf Beaulac

Reputation: 5246

Index your data using the Field.Index.NOT_ANALYZED parameter when you create the field. This will cause the entire value to be indexed as a single Term.

You may then search against it using a simple TermQuery.

https://lucene.apache.org/core/old_versioned_docs/versions/3_0_1/api/all/org/apache/lucene/document/Field.Index.html#NOT_ANALYZED

Upvotes: 4

Related Questions