Solr special characters not indexed

Question

I've read a lot about special characters in solr and escaping them with a "" but unless I am using the string field type I have not gotten this to work for me.

I have an indexed text field that contains a[b as a value. I would like to search on this value and only return documents that contain that text (doesn't need to contain only that text, but does need to have those three characters in that order). Here are some queries that I've tried and the parsedQuery that I see from Solr:

q=field:a\[b parsedquery: field:a field:b (seems to return anything that contains an a or a b) q=field:"a\[b" parsedquery: PhraseQuery(field:"a b") (seems to return anything that contains a b)

I'm using text_general out of the box - I've tried some recommended changes but so far no luck. Has anyone had this problem and found a way to make it work?

Max · Accepted Answer

Solr by default uses StandardTokenizerFactory to create tokens. While creating tokens, this tokenizer removes extraneous character (possibly tokenizes on any special character). It might be possible that Solr is actually tokenizing on '[' and therefore you are not getting the required result. It might also explain why you are getting required result only when using a string type (since the string type is not analyzed). Try using the WhiteSpaceTokenizerFactory instead of StandardTokenizerFactory. The WhiteSpaceTokenizerFactory will tokenize on any whitespace and so, you might be able to query your special chracters (after escaping them).

Do remember to specify the above tokenizer in the index analyzer as well as query and select analyzer ( in short, all analyzers).

an example :-

http://www.pathbreak.com/blog/solr-text-field-types-analyzers-tokenizers-filters-explained

Solr special characters not indexed

Answers (1)

Related Questions