Silvio Giuliani
Silvio Giuliani

Reputation: 41

Solr - Exact Match on solr.TextField

Is there a practicable way to do an exact match search on a stemmed fulltext field? I have a scenario which i need a field to be indexed and searchable regardless of the case or white spaces used. Even using KeywordTokenizerFactory on both index and query, all my searchs based on exact match stopped working. Is there a way to search exact match like a string field and at the same time use customs tokenizers aplied to that field? I posted below the schema i am currently using:

<field name="subtipoimovel" type="buscalimpaquery" indexed="true" stored="true" />

<fieldType name="buscalimpaquery" class="solr.TextField" positionIncrementGap="100">
    <analyzer type="index">
        <tokenizer class="solr.KeywordTokenizerFactory"/>
        <filter class="solr.ASCIIFoldingFilterFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.PatternReplaceFilterFactory" pattern=" " replacement="-"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.KeywordTokenizerFactory"/>
      </analyzer>

regards, Silvio Giuliani

Upvotes: 4

Views: 4287

Answers (3)

Silvio Giuliani
Silvio Giuliani

Reputation: 41

Aparently the problem was this tokenizer:

"solr.KeywordTokenizerFactory"

I changed it to StandardTokenizerFactory and now it works exact matches.

I read the description of KeywordTokenizerFactory on solr wiki and seems to me that to work exact match i should use it instead of StandardTokenizerFactory.

Does anyone know why this happens?

Upvotes: 0

Nick Zadrozny
Nick Zadrozny

Reputation: 7944

As Srikanth notes in a comment, you should consider splitting up the different kinds of term analysis in two separate fields. See also my answer to a functionally similar question: Solr: combining EdgeNGramFilterFactory and NGramFilterFactory.

Upvotes: 0

Arun
Arun

Reputation: 1787

The problem is while indexing you are using KeywordTokenizerFactory, ASCIIFoldingFilterFactory, LowerCaseFilterFactory and PatternReplaceFilterFactory but while query you are using KeywordTokenizerFactory. That will not work good for exact matches. You need to see these as pipelined processors. You need to have "similar" processing during query time too.

Upvotes: 1

Related Questions