Reputation: 157
I know similar questions have already been asked, but I cannot find any answers that suit what I am looking for.
Basically, I want to search for phrases and only return matches which have that exact phrase only and not partial matches.
e.g. A document has "This is a phrase" should not return hits if I search for "This is".
Taking this example: Exact Phrase search using Lucene?
"foo bar" should not return a hit because it is only a partial match. A full match, which is what I'm looking for, would be "foo bar baz".
Here is the code, credit goes to WhiteFang34 for posting this in the above link (I have simply converted to c#):
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Lucene.Net.QueryParsers;
using Lucene.Net.Search;
using Lucene.Net.Documents;
using Lucene.Net.Analysis.Standard;
using Lucene.Net.Analysis;
using Lucene.Net.Store;
using Lucene.Net.Index;
namespace LuceneStatic
{
public static class LuceneStatic
{
public static void LucenePhraseQuery()
{
// setup Lucene to use an in-memory index
Lucene.Net.Store.Directory directory = new RAMDirectory();
Analyzer analyzer = new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29);
var mlf = Lucene.Net.Index.IndexWriter.MaxFieldLength.UNLIMITED;
IndexWriter writer = new IndexWriter(directory, analyzer, true, mlf);
// index a few documents
writer.AddDocument(createDocument("1", "foo bar baz"));
writer.AddDocument(createDocument("2", "red green blue"));
writer.AddDocument(createDocument("3", "test foo bar test"));
writer.Close();
// search for documents that have "foo bar" in them
String sentence = "foo bar";
IndexSearcher searcher = new IndexSearcher(directory, true);
PhraseQuery query = new PhraseQuery();
string[] words = sentence.Split(' ');
foreach (var word in words)
{
query.Add(new Term("contents", word));
}
// display search results
List<string> results = new List<string>();
TopDocs topDocs = searcher.Search(query, 10);
foreach (ScoreDoc scoreDoc in topDocs.ScoreDocs)
{
Document doc = searcher.Doc(scoreDoc.doc);
results.Add(doc.Get("contents"));
}
}
private static Document createDocument(string id, string content)
{
Document doc = new Document();
doc.Add(new Field("id", id, Field.Store.YES, Field.Index.NOT_ANALYZED));
doc.Add(new Field("contents", content, Field.Store.YES, Field.Index.ANALYZED,
Field.TermVector.WITH_POSITIONS_OFFSETS));
return doc;
}
}
}
I have played around with this using difference analyzers and different approaches but I cannot get the required results. I need a match for the full phrase "foo bar baz", but "foo bar" should not return any hits.
Upvotes: 4
Views: 6518
Reputation: 5246
Index your data using the Field.Index.NOT_ANALYZED
parameter when you create the field. This will cause the entire value to be indexed as a single Term
.
You may then search against it using a simple TermQuery.
Upvotes: 4