Robson
Robson

Reputation: 45

Apache Solr filtering query contains "-" don't work

I have the problem with Apache Solr.

Into my result i have parameter named url. It's returns some results, like this.

http://domain.com/re-RU/someLink
http://domain.com/de-DE/someLink
http://domain.com/en-EN/someLink
http://domain.com/cl-EN/someLink
http://domain.com/ka-EN/someLink

When i added a filtering query parameter to my query:

http://ip:port/solr/example/select?q=someSentence&fq=url:ru-RU&wt=json&indent=true

It's working very well, but only for de-DE, ru-RU landuages.

When i trying to filter something with en-EN, i getting result contains cl-EN, ka-EN too

Where is the problem? How to resolve my issue?

Upvotes: 0

Views: 694

Answers (2)

Saurabh Chaturvedi
Saurabh Chaturvedi

Reputation: 2166

Create an analyzer urlFilter in your schema.xml as below .

<fieldType name="urlFilter" class="solr.TextField" positionIncrementGap="100">
    <analyzer>
      <tokenizer class="solr.WhiteSpaceTokenizerFactory"/>
      <filter class="solr.TrimFilterFactory"/>
      <filter class="solr.CommonGramsFilterFactory" words="stopwords.txt" ignoreCase="true"/>
      <filter class="solr.WordDelimiterFilterFactory" generateNumberParts="1" stemEnglishPossessive="1"generateWordParts="1" preserveOriginal="1" catenateWords="1"/>
      <filter class="solr.LowercaseFilterFactory"/>
    </analyzer>

Then use above analyser as the type for your url field in schema.xml as below

<field name="url" type="urlFilter" indexed="true" stored="true"/>

And then, query like this

http://ip:port/solr/example/select?q=someSentence&fq=url:*ru-RU*&wt=json&indent=true

This will 100% work . Let me know if that helps you :) .

Upvotes: 1

skm
skm

Reputation: 426

You need to check your schema.xml as your url might be broken on "-" like in en-EN,it might be creating tokens en and EN separately . For example, if you are using StandardTokenizerFactory as your tokenizer class, then en-EN will be broken as en and EN, de-DE into de and DE. Similarly when you are querying you need to check which tokenizer you should use while querying because if you are using StandardTokenizerFactory while querying then fq=en-EN will also be broken into tokens en and EN. For more about tokenizers, please check : https://cwiki.apache.org/confluence/display/solr/Tokenizers

Upvotes: 2

Related Questions