Reputation: 2624
I am new to Lucene.NET. I am adding fields as
Field.Index.NOT_ANALYZED
in a Lucene document. There is one default field which is added in document as
Field.Index.ANALYZED
I have no difficulty in searching the default field; but when I search on a specific field then Lucene returns 0 document. However if I change,
Field.Index.NOT_ANALYZED
to
Field.Index.ANALYZED
then things work properly. I think there is something to do with Analyzer. Can any body guide me on how to search a Field.Index.NOT_ANALYZED
field?
Here is how I am creating the query parser:
QueryParser parser =
new QueryParser(
Version.LUCENE_30,
"content",
new StandardAnalyzer(Version.LUCENE_30));
Upvotes: 5
Views: 6978
Reputation: 6134
The issue seems to be using search values that do not match literally the values currently indexed; in other words, trying to match document containing hello world
with a search for Hello World
. Since all your fields are marker as NOT_ANALYZED
Lucene is not processing (using an analyzer/tokenizer) the terms; it is simply indexing as they are passed, storing a string like hello world as hello world
. For a search to return a match on that document, the search term needs to be exactly
hello world
and not, Hello World or hello world. or Hello. All of these searches will return zero matches. For Lucene, it would be like trying to search for the number 3
, and get a match for 2
or 4
(as illogical as it might sound).
This is why the use of NOT_ANALYZED
is only recommended for ID-type fields where you want the search to return an exact match, not a list of related/similar field values.
The advantage of using ANALYZED
is that the search becomes more intuitive and friendly. Indexing a value like hello world
will break the term down into tokens (to provide for partial matches like hello or world or even ello) and stored in all-lowercase to avoid mismatches due to different casing (like Hello World or hELLO).
Upvotes: 2
Reputation: 19781
ANALYZED
just means that the value is passed through an Analyzer before being indexed, while NOT_ANALYZED
means that the value will be indexed as-is. The later means that a value like "hello world" will be indexed as just exactly that, the string "hello world". However, the syntax for the QueryParser class parses spaces as a term-separator, creating two terms "hello" and "world".
You will be able to match the field if you created a var q = new TermQuery(new Term(field, "hello world"))
instead of calling var q = queryParser.Parse(field, "hello world")
.
Upvotes: 14