Thomas
Thomas

Reputation: 34188

Guide line for indexing data with Lucene.Net

I have gone through a small article for how to index data using Lucene.Net but few code meaning was not clear to me those are

Document doc = new Document();
doc.Add(new Field("ID", oData.ID.ToString() + "_" + oData.Type, Field.Store.YES, Field.Index.UN_TOKENIZED));
doc.Add(new Field("Title", oData.Title, Field.Store.YES, Field.Index.TOKENIZED));
doc.Add(new Field("Description", oData.Description, Field.Store.YES, Field.Index.TOKENIZED));
doc.Add(new Field("Url", oData.Url, Field.Store.YES, Field.Index.TOKENIZED));
writer.AddDocument(doc);

What is the meaning of this line doc.Add(new Field("ID", oData.ID.ToString() + "_" + oData.Type, Field.Store.YES, Field.Index.UN_TOKENIZED));

What is the meaning of Field.Index.UN_TOKENIZED and Field.Index.TOKENIZED

if possible please discuss about the importance of these words in details UN_TOKENIZED and Field.Index.TOKENIZED.

Upvotes: 1

Views: 652

Answers (1)

andyp
andyp

Reputation: 6269

Lucene has deprecated TOKENIZED and UN_TOKENIZED, they're now named ANALYZED and NOT_ANALYZED.

The meaning of NOT_ANALYZED is, that the fields contents will not be run through an analyzer. In effect they're considered a single 'term' if searched. As an example for where this is useful the documentation names unique product ids (i.e. EANs or UPCs).

The meaning of ANALYZED means that the fields contents will be analyzed and (possibly) be broken down into more than one 'term'. The Lucene documentation mentions this is useful for common text. The accepted answer to this question explains some commonly used analyzers very well.

For further reference please also refer to the Lucene.net and Lucene documentations.

Upvotes: 3

Related Questions