vintubr
vintubr

Reputation: 21

Solr PhraseQuery With Wildcard

I have been struggling with Solr v4.10.2 with a PhraseQuery with wildcard! My field definition is below:

<!-- Search field -->
<field name="title" type="text_pt_en" indexed="true" stored="true" />

<!-- Field definition -->
<fieldType name="text_pt_en" class="solr.TextField" positionIncrementGap="100">
    <analyzer type="index">
        <charFilter class="solr.HTMLStripCharFilterFactory" />

        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_pt.txt" format="snowball" enablePositionIncrements="true" />

        <tokenizer class="solr.WhitespaceTokenizerFactory" />
        <!-- <tokenizer class="solr.KeywordTokenizerFactory" /> -->

        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.ASCIIFoldingFilterFactory" preserveOriginal="false" />
        <filter class="solr.ReversedWildcardFilterFactory" />
    </analyzer>

    <analyzer type="query">
        <charFilter class="solr.HTMLStripCharFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_pt.txt" format="snowball" enablePositionIncrements="true" />

        <tokenizer class="solr.WhitespaceTokenizerFactory" />
        <!-- <tokenizer class="solr.KeywordTokenizerFactory" /> -->

        <filter class="solr.LowerCaseFilterFactory" />
        <filter class="solr.ASCIIFoldingFilterFactory" preserveOriginal="false" />
    </analyzer>
</fieldType>

Let's suppose I have the following value added to the index of the field above (portuguese):

Teste de texto; Será quebrado em espaços em branco!

And the values added to the index, based on the analyzer chain will be (from Solr "Analysis"):

etset teste ;otxet texto; odarbeuq quebrado socapse espacos !ocnarb branco!

Today, I can search, for example:

title:teste
title:(teste texto)
title:(teste de texto)
title:("teste de texto;") // (PhraseQuery) matches because of ";" in the end of the string

But, if I try to search (PhraseQuery):

title:("teste de texto")
    "parsedquery": "PhraseQuery(title:\"teste ? texto\")"

title:("teste de texto*")
    "parsedquery": "PhraseQuery(title:\"teste ? texto*\")"

No results are returned.

I have read about possible solutions to this, but none of them seems to work:

And I just can't understand why the query with the wildcard in the end: "*" does not work, no results are returned.

Some comments:

Could you please help me understand what happens, if there is a way to make a PhraseQuery with a wildcard work and what are my options?

Please, let me know if you need further information and thanks a lot for your attention and help!

Upvotes: 0

Views: 1939

Answers (1)

vintubr
vintubr

Reputation: 21

I found a solution to my problem with the configuration below:

<analyzer type="index">
    <charFilter class="solr.HTMLStripCharFilterFactory" />

    <tokenizer class="solr.WhitespaceTokenizerFactory" />

    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.ASCIIFoldingFilterFactory" preserveOriginal="false" />
    <filter class="solr.ReversedWildcardFilterFactory" />
</analyzer>

And search using the Complex Phrase Query Parser, like below, now returns the desired document:

{!complexphrase df=title}"teste de texto*"

I think that the problem with my last field setup was the StopFilterFactory, as the Complex Phrase Query Parser documentation states: "It is recommended not to use stopword elimination with this query parser." [1]

I've done some tests and, so far, this setup fits my needs (queries).

[1] https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-ComplexPhraseQueryParser

Upvotes: 2

Related Questions