Reputation: 21
I have been struggling with Solr v4.10.2 with a PhraseQuery with wildcard! My field definition is below:
<!-- Search field -->
<field name="title" type="text_pt_en" indexed="true" stored="true" />
<!-- Field definition -->
<fieldType name="text_pt_en" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<charFilter class="solr.HTMLStripCharFilterFactory" />
<filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_pt.txt" format="snowball" enablePositionIncrements="true" />
<tokenizer class="solr.WhitespaceTokenizerFactory" />
<!-- <tokenizer class="solr.KeywordTokenizerFactory" /> -->
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ASCIIFoldingFilterFactory" preserveOriginal="false" />
<filter class="solr.ReversedWildcardFilterFactory" />
</analyzer>
<analyzer type="query">
<charFilter class="solr.HTMLStripCharFilterFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_pt.txt" format="snowball" enablePositionIncrements="true" />
<tokenizer class="solr.WhitespaceTokenizerFactory" />
<!-- <tokenizer class="solr.KeywordTokenizerFactory" /> -->
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.ASCIIFoldingFilterFactory" preserveOriginal="false" />
</analyzer>
</fieldType>
Let's suppose I have the following value added to the index of the field above (portuguese):
Teste de texto; Será quebrado em espaços em branco!
And the values added to the index, based on the analyzer chain will be (from Solr "Analysis"):
etset teste ;otxet texto; odarbeuq quebrado socapse espacos !ocnarb branco!
Today, I can search, for example:
title:teste
title:(teste texto)
title:(teste de texto)
title:("teste de texto;") // (PhraseQuery) matches because of ";" in the end of the string
But, if I try to search (PhraseQuery):
title:("teste de texto")
"parsedquery": "PhraseQuery(title:\"teste ? texto\")"
title:("teste de texto*")
"parsedquery": "PhraseQuery(title:\"teste ? texto*\")"
No results are returned.
I have read about possible solutions to this, but none of them seems to work:
And I just can't understand why the query with the wildcard in the end: "*" does not work, no results are returned.
Some comments:
Could you please help me understand what happens, if there is a way to make a PhraseQuery with a wildcard work and what are my options?
Please, let me know if you need further information and thanks a lot for your attention and help!
Upvotes: 0
Views: 1939
Reputation: 21
I found a solution to my problem with the configuration below:
<analyzer type="index">
<charFilter class="solr.HTMLStripCharFilterFactory" />
<tokenizer class="solr.WhitespaceTokenizerFactory" />
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ASCIIFoldingFilterFactory" preserveOriginal="false" />
<filter class="solr.ReversedWildcardFilterFactory" />
</analyzer>
And search using the Complex Phrase Query Parser, like below, now returns the desired document:
{!complexphrase df=title}"teste de texto*"
I think that the problem with my last field setup was the StopFilterFactory, as the Complex Phrase Query Parser documentation states: "It is recommended not to use stopword elimination with this query parser." [1]
I've done some tests and, so far, this setup fits my needs (queries).
[1] https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-ComplexPhraseQueryParser
Upvotes: 2