Reputation: 4655
Am implementing full-text search using solr and I would appreciate it if someone could offer me some help with some problem am facing.
My schema.xml looks as follows:
<?xml version="1.0" encoding="UTF-8" ?>
<schema name="products" version="1.2">
<types>
<fieldType name="long" class="solr.TrieLongField" precisionStep="0" omitNorms="true" positionIncrementGap="0"/>
<fieldType name="concatenated" class="solr.TextField" positionIncrementGap="100" >
<analyzer>
<tokenizer class="solr.LowerCaseTokenizerFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="15" side="front"/>
<filter class="solr.WordDelimiterFilterFactory"
splitOnCaseChange="0"
splitOnNumerics="1"
catenateWords="1"
catenateNumbers="1"
catenateAll="1"
preserveOriginal="1"
/>
</analyzer>
</fieldType>
</types>
<fields>
<field name="keyid" type="long" indexed="true" stored="false" required="true"/>
<field name="combined" type="concatenated" indexed="true" stored="false"/>
</fields>
<uniqueKey>keyid</uniqueKey>
<defaultSearchField>combined</defaultSearchField>
<copyField source="keyid" dest="keyid"/>
<solrQueryParser defaultOperator="OR"/>
</schema>
And my data-config.xml file looks as follows:
<dataConfig>
<document name="products">
<entity name="product" query="SELECT ProductId AS keyid, CONVERT(VARCHAR(18), ProductId) + ' ' + ProductName AS combined FROM Products"
<field column="keyid" name="keyid"/>
<field column="combined" name="combined"/>
</entity>
</document>
</dataConfig>
And I have a record like follows in my Products table
ProductId|ProductName
239289231|Windows 7
Assuming a successful setup and indexing (using localhost:8089/sorl/dataimport?command=full-import
), why would I not get results when I run this query:
Scenario 1:localhost:8089/solr/select?q=combined:239289233
Yet the queries below do give me results (one searching from the keyid field and another from the combined field):
Scenario 2:localhost:8089/solr/select?q=combined:Windows
Scenario 3:localhost:8089/solr/select?q=keyid:239289233
Is the problem the TokenizerFactory or FilterFactory that am using here? Shouldn't Solr treat ProductId
as a string after its cast to VARCHAR
and concatenated - hence make it possible to call it out the way am doing in Scenario 1
?
Upvotes: 0
Views: 183
Reputation: 22555
Yes, the issue here is the tokenizers. Your first tokenizer, the LowerCaseTokenizerFactory
completely strips off the numbers, so that is why you cannot find search and find any values with your ProductId values. In your example case, it is only indexing the word Windows.
I am assuming you perhaps want to lowercase the value, so you would want to use the StandardTokenizerFactory
as your tokenizer, and LowerCaseFilterFactory
as a filter to lowercase the values. That will include the ProductId value as a token to be indexed and have NGrams built against the following tokens - 239289231
, Windows
and 7
.
Here is a suggested modified fieldType
<fieldType name="concatenated" class="solr.TextField" positionIncrementGap="100" >
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="1"
maxGramSize="15" side="front"/>
<filter class="solr.WordDelimiterFilterFactory"
splitOnCaseChange="0"
splitOnNumerics="1"
catenateWords="1"
catenateNumbers="1"
catenateAll="1"
preserveOriginal="1"
/>
</analyzer>
</fieldType>
Also, I would recommend reviewing the Analyzers, Tokenizers and Token Filters page on the Solr Wiki for examples of how the various ones work, if you have not already. In this case it was just a mix up between a tokenizer and a filter I believe.
Upvotes: 1