Reputation: 1007
String is :-
<GET:notes/count><GET:notes/search_note><GET:util/codemaps/([^/]+?)><GET:users/pending_requests><GET:users/pending_activation><GET:users/firstnames><GET:users/profile><GET:tasks/tasks/count><GET:school/schools/count><GET:school/classrooms/count><GET:quiz/count><GET:quiz/quizset/count><GET:notes/([^/]+?)><GET:locations/counties/count><GET:lesson/books/count><GET:general/codemaps/([^/]+?)><GET:discussions/topics/count><GET:admin/sessions><GET:admin/sessions/count><GET:admin/sessions/([^/]+?)><PUT:content/actions><POST:content/html/totext><GET:content/multimedia/images/([^/]+?)/([^/]+?)>
my query is:
<pre>log_message:"*emaps/\(\[\^/\]\+\?\)\>*"</pre>
here log_message is field and it's type is
text_std_token_lower_caseTokenizer are:
<fieldType name="text_std_token_lower_case" class="solr.TextField" positionIncrementGap="100" multiValued="true">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory" />
</analyzer>
</fieldType>
Upvotes: 3
Views: 108
Reputation: 6871
The tokenizer you have chosen (StandardTokenizerFactory) ignores punctuation characters. You can see this if you go to the analyisis page in the Solr admin UI. This will effect the tokenization of both your query and your field. You will need a tokenizer that does not omit punctuation.
One possible option is to use the Regular Expression Tokenizer documented on the Solr wiki (https://cwiki.apache.org/confluence/display/solr/Tokenizers) Perhaps you are looking for something like this?
<analyzer>
<tokenizer class="solr.PatternTokenizerFactory" pattern="(>?<(PUT|GET|POST):)|>\s"/>
</analyzer>
That may require some tweaking if the urls can contain > characters that are not % encoded, or HEAD is possible etc. I am not confident that this will perform well however since regular expressions can become expensive. If this bogs things down you might need to write your own tokenizer.
Upvotes: 1