Reputation: 190
i need to index some document with custom tokenizer. my sample doc is look like this:
"I love to live in New York"
and list of expressions is:
["new york", "good bye", "cold war"]
is there any way to tokenize string normally but do not tokenize my dataset?
["I", "love", "to", "live", "in", "New York"]
Upvotes: -1
Views: 53
Reputation: 32386
Yes, but you need to give your data-set in the analyzer definition, as there is no pattern in your data set, that way these words would be excluded from your tokenization process, adding a working sample on your data-set would be helpful.
Upvotes: 1