menakshisundaram
menakshisundaram

Reputation: 193

Solr query data with white space needs to be queried

I am new to solr. I have data in solr something like "name":"John Lewis". Query formed looks and searches perfectly as fq=name%3A+%22John+Lewis%22 This is formed in Solr console and works well.

My requirement is to search a particular word coming from my Java layer as "JohnLewis". It has to be mapped with "John Lewis" in solr repo.

This search is not just restricted to name field(2 words and a space in-between). I have some other details like "Cash Reward Credit Cards", which has 4 words and user would query like "CashRewardCreditCards".

Could someone help me on this, if this can be handled in schema.xml with any parsers that is available in solr.

Upvotes: -1

Views: 984

Answers (3)

Anand
Anand

Reputation: 81

Look at WordDelimiterFilterFactory

It has a splitOnCaseChange property. If you set that to 1, JohnLewis will be indexed as John Lewis.

You'll need to add this to your query analyzer. If the user searches for JohnLewis, the search will be translated to John Lewis.

Upvotes: 0

David George
David George

Reputation: 3752

Assuming your input is CamelCase as shown I would use Solr's Word Delimiter Filter with the splitOnCaseChange parameter on the query side of your analyzer as a starting point. This will take an input token such as CashRewardCreditCards and generate the tokens Cash Reward Credit Cards

See also:

https://cwiki.apache.org/confluence/display/solr/Filter+Descriptions#FilterDescriptions-WordDelimiterFilter

Upvotes: 0

Ashraful Islam
Ashraful Islam

Reputation: 12830

You need to create custom fieldType.

First define a fieldType in your solr schema :

<fieldType name="word_concate" class="solr.TextField" indexed="true" stored="false">
    <analyzer>
        <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="\s*" replacement=""/>
        <tokenizer class="solr.StandardTokenizerFactory"/>
    </analyzer>
</fieldType>

Here we named the fieldType as word_concate.
We used CharFilterFactories's solr.PatternReplaceCharFilterFactory

Char Filter is a component that pre-processes input characters. Char Filters can be chained like Token Filters and placed in front of a Tokenizer. PatternReplaceCharFilterFactory filter uses regular expressions to replace or change character patterns

Pattern : \s* means zero or more whitespace character

Second create a field with word_concate as type :

<field name="cfname" type="word_concate"/>

Copy your name field to cfname with copy field

<copyField source="name" dest="cfname"/>

Third reindex the data.

Now you can query : cfname:"JohnLewis" it will return name John Lewis

Upvotes: 1

Related Questions