Reputation: 3978
I have a text field with the following definition:
<fieldType name="myTextField" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.NGramFilterFactory" minGramSize="1"
maxGramSize="40"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
Now, i have a field that contains this text: "Hi this is a question about solr"
And another field that contains this text: "aaa solr bbb"
When my query string is "about solr", i'm getting both fields as a result, though i only want the first one because it is the only one that contains all characters (including white space). This is not only happening with white space, but also with other special chars like ":". searching for (about solr) with brackets doesn't help.
NOTE: i'm escaping my string before searching it:
String s1 = ClientUtils.escapeQueryChars(s);
Any suggestions?
Upvotes: 1
Views: 2480
Reputation: 242
You can use solr phrase query. Your query syntex will look like:
String query = "\"about solr\"";
Then it would match the field that you require.
Your current query is interpreted like:
String query = "about OR solr";
So it will match both fields, the reson behind this is for solr query parser default operaor is "OR". Check your schema.xml file it's having following entry:
<solrQueryParser defaultOperator="OR"/>
Hope this will clear your doubt.
For more details please refer below links:
http://www.solrtutorial.com/solr-query-syntax.html
http://www.solrtutorial.com/schema-xml.html
Upvotes: 2
Reputation: 1787
This is expected behavior of Solr. You will need to use about AND solr to get the behavior you want. The default is OR. If you want to change this behavior you can do it by using q.op parameter as q.op=AND , then about solr will be processed as per your expectation. But it is not a good idea to change the OR to AND as OR is generally assumed so it is not a good idea to change it. Instead change your query to AND.
Standard analyzer breaks your phrases at whitespace and special characters. There is no such list, any non alpha numeric char becomes white space.
Read more about analyzer here: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
Update: The characters that break for StandardTokenizerFactory are for example &,. ,- etc. So this sentence : "Me & my Dog went for a walk. The dog chased a toy-squirrel." will be analyzed as => Me my Dog went for a walk The dog chased a toy squirrel.
If you want to break only at white space use : WhitespaceTokenizerFactory
Update : The characters that break for WhitespaceTokenizerFactory are white space and new line . So this sentence : "Me & my Dog went for a walk. The dog chased a toy-squirrel." will be analyzed as => [Me] [&] [my] [Dog] [went] [for] [a] [walk.] [The] [dog] [chased] [a] [toy-squirrel.].
Upvotes: 5