Robert Kang
Robert Kang

Reputation: 568

The proper Solr Tokenizer to tokenize text while preserving special characters

Which tokenizer is appropriate to do this:

input: "This-something is something."
output: ["] [This] [-] [something] [is] [something] [.] ["]

I tried with solr.WordDelimiterFilterFactory, but this removes all the special characters. Also tried solr.KeepWordFilterFactory, with all the special characters in keepwords.txt. But this doesn't work either.

Any suggestions? I am on Solr 3.4.

Upvotes: 1

Views: 429

Answers (1)

Jayendra
Jayendra

Reputation: 52779

Don't think there is an out of the box Tokenizer for your specific requirement.
You can create a new one specific to the requirements and easily have Solr use it.

Upvotes: 2

Related Questions