Reputation: 49
I have a problem searching a document in Solr by a query.
The document looks like this:
{
"id": "890_03366_00739",
"text": ["2509412 MARCO GLLMRC86E28L736X 03366 00739 "],
"_version_": 1612212288969769000
}
If i search with query text:GLLMRC86E28L736
i found correctly the document.
If i try with query text:GLLMRC86E28L736X
i can't find the document, why this happens?
In my schema the field text
is declared as <field name="text" type="text_general" indexed="true" required="true" stored="true"/>
I'm using Solr 7.0.0.
UPDATE:
The "Analysis" page shows this output for my field "text" and query GLLMRC86E28L736X
For query GLLMRC86E28L736
Search by GLLMRC86E28L736X
Search by GLLMRC86E28L736
The field type "text_general" is declared as
<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100" multiValued="true">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.PorterStemFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" maxGramSize="15" minGramSize="2"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
<filter class="solr.SynonymGraphFilterFactory" expand="true" ignoreCase="true" synonyms="synonyms.txt"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
</fieldType>
Upvotes: 0
Views: 123
Reputation: 52822
Your EdgeNgramFilter has a maxGramSize
setting that cuts off the ending of the token - the X is dropped when indexing, while it's kept when querying (as it should, if you're attempting to match prefixes).
On the left side of the analysis screen you can see that it generates versions of GLLMRC86E28L736X
, but the last character is dropped - i.e. it stops generating versions before adding the last one. The query is still GLLMRC86E28L736X
, and since there is no token matching GLLMRC86E28L736X
(only GLLMRC86E28L736
since it stopped after generating that), you get no hit.
Adjust the maxGramSize
for your field, or search against a field that doesn't do any edgengramming if you want to get exact matches only.
In addition, this is not the default form for the text_general
field type included in the examples if I remember correctly, so in the future it'll be helpful if you include the field type as well.
Upvotes: 1