Reputation: 568
Which tokenizer is appropriate to do this:
input: "This-something is something."
output: ["] [This] [-] [something] [is] [something] [.] ["]
I tried with solr.WordDelimiterFilterFactory
, but this removes all the special characters. Also tried solr.KeepWordFilterFactory
, with all the special characters in keepwords.txt
. But this doesn't work either.
Any suggestions? I am on Solr 3.4.
Upvotes: 1
Views: 429
Reputation: 52779
Don't think there is an out of the box Tokenizer for your specific requirement.
You can create a new one specific to the requirements and easily have Solr use it.
Upvotes: 2