Reputation: 7150
I have the name Audioslave
indexed on Solr and I want to match that document to the query string Audio Slave
.
I have the following rule configured:
<fieldType name="text_filter" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.KeywordTokenizerFactory" />
<filter class="solr.WordDelimiterFilterFactory"
catenateWords="1"
catenateNumbers="1"
catenateAll="1"
preserveOriginal="1"
generateWordParts="1"
generateNumberParts="1"/>
<filter class="solr.TrimFilterFactory" />
<filter class="solr.LowerCaseFilterFactory" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory" />
<filter class="solr.WordDelimiterFilterFactory"
catenateWords="1"
catenateNumbers="1"
catenateAll="1"
preserveOriginal="1"
generateWordParts="1"
generateNumberParts="1"/>
<filter class="solr.TrimFilterFactory" />
<filter class="solr.LowerCaseFilterFactory" />
</analyzer>
</fieldType>
And a field using it:
<field name="artist_name_filter" type="text_filter" multiValued="false" indexed="true" stored="true" required="false" />
When using Solr analysis tool everything looks good.
The Query part is the following:
Audio Slave
, Audio Slave
, Audio
, AudioSlave
and Slave
(lets just use the 3rd column (AudioSlave
) from here.AudioSlave
audioslave
On the other hand, the index part is:
Audioslave
, Audioslave
audioslave
So both fields should match, but the query returns no results:
http://localhost:8983/solr/search_api/select?defType=edismax&fq=type:Artist&q=Audio%20slave&qf=artist_name_filter&wt=json
Upvotes: 2
Views: 2513
Reputation: 8658
Try by using the WhitespaceTokenizerFactory
as a tokenizer for your index part.
Here the KeywordTokenizerFactory
keeps the text as it is...it won't create any tokens.
Replace the same with WhitespaceTokenizerFactory
.
WhitespaceTokenizerFactory
will create tokens at space.
Upvotes: 0
Reputation: 33341
Your problem isn't analysis, it's QueryParser syntax. Spaces are used to separate query clauses, and that isn't affected by the analyzer. When you have q=Audio slave
, it applies query syntax rules first, and separates it into clauses "Audio" and "slave", and then analyzes each clause separately.
Escaping the space should do the job, I believe: q=Audio\ slave
A phrase query here seems like it ought to work, such as q="Audio slave"
, but it doesn't. It generates something like: "(audio slave audio audioslave) slave"
for me, which is problematic.
Upvotes: 2