Bick
Bick

Reputation: 18551

Solr - cant query special characters or numbers

In the solr field PackageTag

<field name="PackageTag" type="text_en_splitting" indexed="true" stored="true" required="false" multiValued="true"/>

I have the following value

"playing @@*"

now I am searcing for "play" i get it in my result.
But when I am searching with @@* i do not. It is omitted in the word delimiter.

Is there a way i can let the user search upon its special caharacters but still use word delimiting?

Upvotes: 0

Views: 935

Answers (3)

Shivan Dragon
Shivan Dragon

Reputation: 15229

There are twoissues here:

  • first off, you must create your own fieldType in Solr and configure it to NOT user "@" and "*" as stopWords:

in schema.xml do something like this:

<types>
        <fieldType name="myTextFieldType" class="solr.TextField" positionIncrementGap="100">
            <analyzer type="index">
                <tokenizer class="solr.StandardTokenizerFactory" />
                <filter class="solr.StopFilterFactory" ignoreCase="true"
                    words="stopwords.txt" enablePositionIncrements="true" />
            </analyzer>
            <analyzer type="query">
                <tokenizer class="solr.StandardTokenizerFactory" />             
                <filter class="solr.StopFilterFactory" ignoreCase="true"
                    words="stopwords.txt" enablePositionIncrements="true" />
            </analyzer>
        </fieldType>
        </types>

You must then use that fieldType for the "PackageTag" field:

<field name="PackageTag" type="text_en_splitting"
  • Then, in the "conf" dir (the same dir where schema.xml is located), create or edit the stopwords.txt file and add "@" and "*" to it. Just put them in there, each character on one line:

    @

    *

Now, since the "*" character is also a special character for Lucene queries (wildcard), you need to escape it in your queries. You can escape "*" by replacing it with "\*". Something like this:

PackageTag:bla\*

to search for fields containing "bla*".

Upvotes: 1

Maurizio In denmark
Maurizio In denmark

Reputation: 4284

You have to add your word delimiter chars in the protwords.txt file and then apply a filter that use the protwords at indexation and query time. (for example solr.WordDelimiterFilterFactory with the protected="protwords.txt" parameter).

In this way they will be tokenized as you want and not removed during query time.

Upvotes: 0

Vidya
Vidya

Reputation: 30310

I don't recall the list of Lucene special characters, but did you try escaping with a \ (back slash) before the character?

If that doesn't work, you might want to take a look at the Analyzer you are using to index your fields. StandardAnalyzer might do something funny with your special characters, so you could consider another analyzer or roll your own.

Upvotes: 0

Related Questions