Anirudh Jadhav
Anirudh Jadhav

Reputation: 1007

Not able to exclude partial string in solr 5.3.1?

String is :- <GET:notes/count><GET:notes/search_note><GET:util/codemaps/([^/]+?)><GET:users/pending_requests><GET:users/pending_activation><GET:users/firstnames><GET:users/profile><GET:tasks/tasks/count><GET:school/schools/count><GET:school/classrooms/count><GET:quiz/count><GET:quiz/quizset/count><GET:notes/([^/]+?)><GET:locations/counties/count><GET:lesson/books/count><GET:general/codemaps/([^/]+?)><GET:discussions/topics/count><GET:admin/sessions><GET:admin/sessions/count><GET:admin/sessions/([^/]+?)><PUT:content/actions><POST:content/html/totext><GET:content/multimedia/images/([^/]+?)/([^/]+?)>

my query is:

<pre>log_message:"*emaps/\(\[\^/\]\+\?\)\>*"</pre>

here log_message is field and it's type is

text_std_token_lower_case
Tokenizer are:

<fieldType name="text_std_token_lower_case" class="solr.TextField" positionIncrementGap="100" multiValued="true">
  <analyzer>
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory" />
  </analyzer>
</fieldType>

Upvotes: 3

Views: 108

Answers (1)

Gus
Gus

Reputation: 6871

The tokenizer you have chosen (StandardTokenizerFactory) ignores punctuation characters. You can see this if you go to the analyisis page in the Solr admin UI. This will effect the tokenization of both your query and your field. You will need a tokenizer that does not omit punctuation.

One possible option is to use the Regular Expression Tokenizer documented on the Solr wiki (https://cwiki.apache.org/confluence/display/solr/Tokenizers) Perhaps you are looking for something like this?

<analyzer>
  <tokenizer class="solr.PatternTokenizerFactory" pattern="(>?<(PUT|GET|POST):)|>\s"/>
</analyzer>

That may require some tweaking if the urls can contain > characters that are not % encoded, or HEAD is possible etc. I am not confident that this will perform well however since regular expressions can become expensive. If this bogs things down you might need to write your own tokenizer.

Upvotes: 1

Related Questions