Reputation: 12695
I have stored the following documents in my lucene index:
{
"id" : 1,
"name": "John Smith"
"description": "worker"
"additionalData": "faster data"
"attributes": "is_hired=not"
},
{
"id" : 2,
"name": "Alan Smith"
"description": "hired"
"additionalData": "faster drive"
"attributes": "is_hired=not"
},
{
"id" : 3,
"name": "Mike Std"
"description": "hired"
"additionalData": "faster check"
"attributes": "is_hired=not"
}
and now I want to seach over all the fields to check if the given value exists:
search term: "John data check"
which sould me return the documents with ID 1 and 3
. But it doesn't, why ?
var analyzer = new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_30);
BooleanQuery mainQuery = new BooleanQuery();
mainQuery.MinimumNumberShouldMatch = 1;
var cols = new string[] {
"name",
"additionalData"
};
string[] words = searchData.text.Split(new string[] { " " }, StringSplitOptions.RemoveEmptyEntries);
var queryParser = new MultiFieldQueryParser(Lucene.Net.Util.Version.LUCENE_30, cols, analyzer);
foreach (var word in words)
{
BooleanQuery innerQuery = new BooleanQuery();
innerQuery.MinimumNumberShouldMatch = 1;
innerQuery.Add(queryParser.Parse(word), Occur.SHOULD);
mainQuery.Add(innerQuery, Occur.MUST);
}
TopDocs hits = searcher.Search(mainQuery, null, int.MaxValue, Sort.RELEVANCE);
//hits.TotalHits is 0 !!
Upvotes: 2
Views: 328
Reputation: 12695
Well, in my case I stored a string array with the same field name, I had to retrieve all field values from the result Document
, because the Document.Get("field_name")
returns only the first column value when there are many fields with the same way
var multi_fields = doc.GetFields("field_name");
var field_values = multi_fields.Select(x => x.StringValue).ToArray();
Plus, I had to enable the WildCard search, because it fails if I don't type a full word, e.g. Jo
instead of John
string[] words = "Jo data check".Split(new string[] { " " }, StringSplitOptions.RemoveEmptyEntries).Select(x => string.Format("*{0}*", x)).ToArray();
var queryParser = new MultiFieldQueryParser(Lucene.Net.Util.Version.LUCENE_30, cols, analyzer);
parser.AllowLeadingWildcard = true;
Upvotes: 0
Reputation: 51330
The query you constructed basically requires all three words to match.
You wrap each word in a BooleanQuery
with a SHOULD
clause. This is equivalent to using the inner query directly (you're just adding an indirection which does not change the behavior of the query). The boolean query has only one clause, which should match for the boolean query to match.
Then, you wrap each one of these in another boolean query, this time with a MUST
clause for each. This means each clause must match for the query to match.
For a BooleanQuery
to match, all MUST
clauses have to be satisfied, and if there are none, then a minimum of MinimumNumberShouldMatch
SHOULD
clauses have to be satisfied. Leave that property at its default value, as the documented behavior is:
By default no optional clauses are necessary for a match (unless there are no required clauses).
Effectively, your query is (assuming there is no MultiFieldQueryParser
for simplicity):
+(john) +(data) +(check)
Or, in a tree form:
BooleanQuery
MUST: BooleanQuery
SHOULD: TermQuery: john
MUST: BooleanQuery
SHOULD: TermQuery: data
MUST: BooleanQuery
SHOULD: TermQuery: check
Which can be simplified to:
BooleanQuery
MUST: TermQuery: john
MUST: TermQuery: data
MUST: TermQuery: check
But the query you want is:
BooleanQuery
SHOULD: TermQuery: john
SHOULD: TermQuery: data
SHOULD: TermQuery: check
So, remove the mainQuery.MinimumNumberShouldMatch = 1;
line, then replace your foreach
body with the following and it should get the job done:
mainQuery.Add(queryParser.Parse(word), Occur.SHOULD);
Ok, so here's a full example, which works for me:
var analyzer = new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_30);
var directory = new RAMDirectory();
using (var writer = new IndexWriter(directory, analyzer, true, IndexWriter.MaxFieldLength.UNLIMITED))
{
var doc = new Document();
doc.Add(new Field("id", "1", Field.Store.YES, Field.Index.NOT_ANALYZED));
doc.Add(new Field("name", "John Smith", Field.Store.NO, Field.Index.ANALYZED));
doc.Add(new Field("additionalData", "faster data", Field.Store.NO, Field.Index.ANALYZED));
writer.AddDocument(doc);
doc = new Document();
doc.Add(new Field("id", "2", Field.Store.YES, Field.Index.NOT_ANALYZED));
doc.Add(new Field("name", "Alan Smith", Field.Store.NO, Field.Index.ANALYZED));
doc.Add(new Field("additionalData", "faster drive", Field.Store.NO, Field.Index.ANALYZED));
writer.AddDocument(doc);
doc = new Document();
doc.Add(new Field("id", "3", Field.Store.YES, Field.Index.NOT_ANALYZED));
doc.Add(new Field("name", "Mike Std", Field.Store.NO, Field.Index.ANALYZED));
doc.Add(new Field("additionalData", "faster check", Field.Store.NO, Field.Index.ANALYZED));
writer.AddDocument(doc);
}
var words = new[] {"John", "data", "check"};
var parser = new MultiFieldQueryParser(Lucene.Net.Util.Version.LUCENE_30, new[] {"name", "additionalData"}, analyzer);
var mainQuery = new BooleanQuery();
foreach (var word in words)
mainQuery.Add(parser.Parse(word), Occur.SHOULD); // Should probably use parser.Parse(QueryParser.Escape(word)) instead
using (var searcher = new IndexSearcher(directory))
{
var results = searcher.Search(mainQuery, null, int.MaxValue, Sort.RELEVANCE);
var idFieldSelector = new MapFieldSelector("id");
foreach (var scoreDoc in results.ScoreDocs)
{
var doc = searcher.Doc(scoreDoc.Doc, idFieldSelector);
Console.WriteLine("Found: {0}", doc.Get("id"));
}
}
Upvotes: 3