dcarneiro
dcarneiro

Reputation: 7150

How to ignore whitespaces on solr query

I have the name Audioslave indexed on Solr and I want to match that document to the query string Audio Slave.

I have the following rule configured:

<fieldType name="text_filter" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.KeywordTokenizerFactory" />
    <filter class="solr.WordDelimiterFilterFactory"
            catenateWords="1"
            catenateNumbers="1"
            catenateAll="1"
            preserveOriginal="1"
            generateWordParts="1"
            generateNumberParts="1"/>
    <filter class="solr.TrimFilterFactory" />
    <filter class="solr.LowerCaseFilterFactory" />
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.KeywordTokenizerFactory" />
    <filter class="solr.WordDelimiterFilterFactory"
            catenateWords="1"
            catenateNumbers="1"
            catenateAll="1"
            preserveOriginal="1"
            generateWordParts="1"
            generateNumberParts="1"/>
    <filter class="solr.TrimFilterFactory" />
    <filter class="solr.LowerCaseFilterFactory" />
  </analyzer>
</fieldType>

And a field using it:

<field name="artist_name_filter"  type="text_filter"  multiValued="false" indexed="true" stored="true" required="false" />

When using Solr analysis tool everything looks good.

The Query part is the following:

On the other hand, the index part is:

So both fields should match, but the query returns no results:

http://localhost:8983/solr/search_api/select?defType=edismax&fq=type:Artist&q=Audio%20slave&qf=artist_name_filter&wt=json

Upvotes: 2

Views: 2513

Answers (2)

Abhijit Bashetti
Abhijit Bashetti

Reputation: 8658

Try by using the WhitespaceTokenizerFactory as a tokenizer for your index part. Here the KeywordTokenizerFactory keeps the text as it is...it won't create any tokens.

Replace the same with WhitespaceTokenizerFactory. WhitespaceTokenizerFactory will create tokens at space.

Upvotes: 0

femtoRgon
femtoRgon

Reputation: 33341

Your problem isn't analysis, it's QueryParser syntax. Spaces are used to separate query clauses, and that isn't affected by the analyzer. When you have q=Audio slave, it applies query syntax rules first, and separates it into clauses "Audio" and "slave", and then analyzes each clause separately.

Escaping the space should do the job, I believe: q=Audio\ slave

A phrase query here seems like it ought to work, such as q="Audio slave", but it doesn't. It generates something like: "(audio slave audio audioslave) slave" for me, which is problematic.

Upvotes: 2

Related Questions