Reputation: 429
I'm trying to use the terms component as described in the Solr docs (see Using the Terms Component for an Auto-Suggest Feature).
Running Solr 6.3.0.
I currently have 4 docs in my index:
{
"responseHeader":{
"status":0,
"QTime":0,
"params":{
"q":"*:*",
"indent":"on",
"wt":"json",
"_":"1482239790124"}},
"response":{"numFound":4,"start":0,"docs":[
{
"id":"1",
"title":["There's nothing better than a shiny red apple on hot summer day."],
"_version_":1554244409915080704},
{
"id":"2",
"title":["Eat an apple!"],
"_version_":1554244409917177856},
{
"id":"3",
"title":["I prefer a Grannie Smith apple over Fuji."],
"_version_":1554244409917177857},
{
"id":"4",
"title":["Apricots is kinda like a peach minus the fuzz."],
"_version_":1554244409917177858}]
}
}
My field definition looks like this (otherwise my scheme.xml is vanilla):
<field name="title" type="strings" indexed="true" stored="true"/>
My terms component is default (as is my whole solarconfig.xml):
<searchComponent name="terms" class="solr.TermsComponent"/>
<requestHandler name="/terms" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<bool name="terms">true</bool>
<bool name="distrib">false</bool>
</lst>
<arr name="components">
<str>terms</str>
</arr>
</requestHandler>
When doing a request like http://localhost:8983/solr/test/terms?terms.fl=title&terms.prefix=ap
, I'm expecting the following in return:
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">1</int>
</lst>
<lst name="terms">
<lst name="title">
<int name="apple">3</int>
<int name="Apricots">1</int>
</lst>
</lst>
</response>
But what I actually get is an empty response.
When I instead do http://localhost:8983/solr/test/terms?terms.fl=title&terms.prefix=Ea
I get:
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">0</int>
</lst>
<lst name="terms">
<lst name="title">
<int name="Eat an apple!">1</int>
</lst>
</lst>
</response>
So it is kind of working, but not case-insensitive and only based on the beginning of the string.
Make it work for all words contained in the title-field (like in the docs) and make the search case-insensitive.
indexed
and stored
; setting multiValued=false
; tried type=string
.I'm guessing that it has something to do with the data type or how Solr stores the field, but I can't figure it out.
Upvotes: 1
Views: 259
Reputation: 429
Thanks to Mats for pointing me in the right direction.
The field type of my title field was indeed wrong and I needed to use another one. While creating my own, I noticed that the default schema.xml has a bunch of predefined field types that do exactly what I wanted.
In my case, I would just set my fields type to text_de:
<field name="title" type="text_de" indexed="true" stored="true"/>
Where text_de was predefined like:
<fieldType name="text_de" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StopFilterFactory" format="snowball" words="lang/stopwords_de.txt" ignoreCase="true"/>
<filter class="solr.GermanNormalizationFilterFactory"/>
<filter class="solr.GermanLightStemFilterFactory"/>
</analyzer>
</fieldType>
Upvotes: 0
Reputation: 52792
If you want to lowercase the contents of a field when indexing, you'll either have to preprocess the content (making it lowercase before indexing it), or easier, use a field type that has a LowercaseFilter
. That field has to be based on a TextField, but you can use the KeywordTokenizer
to keep every value as a single token, instead of it being tokenized based on whitespace or something similar.
The terms handler just looks for tokens that match, so by using a KeywordTokenizer, you keep everything as a single token, and the LowercaseFilter makes sure that the indexed token is kept exclusively in lowercase.
If you however want to split each term in the content into its own token, i.e. apricots, is, kind, etc., use a WhitespaceTokenizer or the StandardTokenizer, together with a LowercaseFilter.
Upvotes: 1