Reputation: 612
I'm working with a solr instance set up earlier at my company, and it seems to not be set up correctly. I'm able to search for something like q=*Paper*
to get results but not for paper
.
It seems like maybe the index-time tokenizer / filter isn't working as I'd expected.
The schema.xml
is set up to tokenize and then index & query without case sensitivity on this description field for example :
<field name="S_DSC" type="string_search" indexed="false" stored="true" required="false"/>
...etc...
<fieldType name="string_search" class="solr.TextField">
<analyzer type="index">
<!--Split at whitespaces and at punctuations. Strip other special characters.-->
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<!--Plural words handling. 'dogs'='dog'. Stemming not recommended. dry 'erase' board is not the same as dry board 'eraser'-->
<filter class="solr.EnglishMinimalStemFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishMinimalStemFilterFactory"/>
</analyzer>
</fieldType>
And the solrconfig.xml
has the default qf
set to:
<requestHandler name="/select" class="solr.SearchHandler">
<lst name="defaults">
<str name="spellcheck">false</str>
<str name="spellcheck.dictionary">default</str>
<str name="spellcheck.dictionary">wordbreak</str>
<str name="spellcheck.onlyMorePopular">true</str>
<str name="spellcheck.extendedResults">false</str>
<str name="spellcheck.collate">true</str>
<str name="spellcheck.count">3</str>
<str name="spellcheck.maxCollations">1</str>
<str name="spellcheck.maxCollationTries">10</str>
<str name="defType">synonym_edismax</str>
<str name="synonyms">false</str>
<str name="qf">C_PN^20.0 PN^15.0 C_S_DSC^10.0 S_DSC^10.0 M_PN^5.0 DIM_NM^2.0 BRD^2.0 combined_search^1</str>
<str name="a">{!type=synonym_edismax qf=$qf v=$q}</str>
</lst>
When I query for q=*
I get results
select?q=*&rows=10&start=0&wt=json
"docs": [
{
"S_DSC": "Foo 8.5\" x 11\" Copy Paper, 20 lbs, 92 Brightness, 5000/Carton (123456)"
...etc...
},
But if I try to search on a term in the description (S_DSC), I don't get results unless it's case sensitive AND I put asterisks around it.
I get results for q=*Paper*
"parsedquery": "(+DisjunctionMaxQuery((combined_search:*paper* | PN:*Paper*^15.0 | S_DSC:*paper*^10.0 | C_PN:*Paper*^20.0 | BRD:*Paper*^2.0 | M_PN:*Paper*^5.0 | DIM_NM:*Paper*^2.0 | C_S_DSC:*paper*^10.0)))/no_coord",
No results for q=paper
"parsedquery": "(+DisjunctionMaxQuery((combined_search:paper | PN:paper^15.0 | S_DSC:paper^10.0 | C_PN:paper^20.0 | BRD:paper^2.0 | M_PN:paper^5.0 | DIM_NM:paper^2.0 | C_S_DSC:paper^10.0)))/no_coord",
No results for q=Paper
"parsedquery": "(+DisjunctionMaxQuery((combined_search:paper | PN:Paper^15.0 | S_DSC:paper^10.0 | C_PN:Paper^20.0 | BRD:Paper^2.0 | M_PN:Paper^5.0 | DIM_NM:Paper^2.0 | C_S_DSC:paper^10.0)))/no_coord",
Shouldn't it be tokenizing the S_DSC above then lowercasing the tokens? (So that paper
is among them?)
What am I missing here? Appreciate any insight :)
Upvotes: 0
Views: 154
Reputation: 52792
Your S_DSC
field is not indexed:
<field name="S_DSC" type="string_search" --> indexed="false" <--
An unindexed field will never generate a hit. My guess is that your hit is coming from one of the other, unprocessed fields which are indexed, and that's why you're getting the behaviour you're seeing.
When you append debug=all
to your query, each found document will shown the term frequency matched (i.e. what makes up the score) for each field, allowing you to see which fields generated hits.
Upvotes: 2