KallDrexx
KallDrexx

Reputation: 27803

How can I do a string search in solr that allows wildcards, whitespace characters, and is case insensitive?

I am attempting to run the following query against my solr system:

((((subtype:place) AND name:fis*) AND addressPostal:98007) AND addressLine1:14320\ 21*)

This query is meant to search for businesses that have the first 3 characters of the name fis and the first 8 characters of the address to be 14320 21.

This returns no matches. However if I change fis* to Fis* it returns the correct match. After further investigation it appears that string types are case sensitive.

I then when to try to define my fields so that they would be case insensitive, allow wildcard searches (or at least starts with searches), and not break on whitespace. Unfortunately I have failed.

The closest I have gotten so far is:

<fieldType name="lowerCaseString" class="solr.TextField" positionIncrementGap="100">
  <analyzer>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

With this I can do ((((subtype:place) AND name:fis*) AND addressPostal:98007) AND addressLine1:14320*) and get the correct match, but I am unable to do the full 8 characters for address search due to the space (and since most addresses only have a few digits before their first space, this is a major issue).

The addressLine1 search needs to be case insensitive too, as I need st == ST == St.

How can I accomplish this?

Upvotes: 0

Views: 1205

Answers (2)

Maurizio In denmark
Maurizio In denmark

Reputation: 4284

Make a text field with the KeywordTokenizerFactory. This tokenizer does no actual tokenizing, so the entire input string is preserved as a single token. So it will be like having a lowercase string field:

<fieldType name="lowerCaseString" class="solr.TextField" positionIncrementGap="100">
   <analyzer>
       <tokenizer class="solr.KeywordTokenizerFactory"/>
       <filter class="solr.LowerCaseFilterFactory"/>
   </analyzer>
</fieldType> 

Upvotes: 1

arun
arun

Reputation: 11023

One simple solution is to keep the type of your field as string, but lower-case the value when you index the data. Then lower-case the query from the client side too.

Upvotes: 0

Related Questions