Reputation: 91
I am using Lucene 3.6 and StandardAnalyzer in my project for Index and Search. Such analizer split search query string by all special chars (@, #, -, _).
For example: if I will serach "[email protected] #2nd place", tokenizer create such query string: [somename][gmail][com][2nd][place]. But I need string like this one:[somename@gmail][com][#2nd][place].
So how to exclude such special char from stop chars?
And one question: I need re-index all with new analizer or just can use new analizer with old index?
Thanks!
Upvotes: 1
Views: 315
Reputation: 26703
StandardAnalyzer
uses StandardTokenizer
for defining grammar rules (word breaks etc.). Documentation of the latter says:
Many applications have specific tokenizer needs. If this tokenizer does not suit your application, please consider copying this source code directory to your project and maintaining your own grammar-based tokenizer.
Quickly peeking into StandardTokenizer
code I could guess that removing "<EMAIL>"
from TOKEN_TYPES
might be sufficient. Or maybe not :-)
And yes, you will need to reindex.
Upvotes: 2