Reputation: 462
I have NGram filtering enabled for a keywords field I am indexing, which contains the following comma separated terms:
wwwdebenhams.com, ebenhams.com, dbenhams.com, deenhams.com, debnhams.com, debehams.com, debenams.com, debenhms.com, debenhas.com, debenham.com, debenhams.ocm, debenhams.con, debenhams.comn, debenhams.copm, debenhams.comm, debenhams.coom, debenhams.xom, debenhams.cpm, ebenhams.com, dbenhams.com, deenhams.com, debnhams.com, debehams.com, debenams.com, debenhms.com, debenhas.com, debenham.com,
The schema for the core looks like this:
<?xml version="1.0" ?>
<schema name="merchant" version="1.0">
<types>
<!--
Default numeric field types. For faster range queries, consider the tint/tfloat/tlong/tdouble types.
-->
<fieldType name="int" class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="text_lowercase_ngram" class="solr.TextField" termPositions="false" omitNorms="true">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory"
splitOnCaseChange="0"
splitOnNumerics="0"
stemEnglishPossessive="0"
generateWordParts="1"
generateNumberParts="1"
catenateWords="0"
catenateNumbers="0"
catenateAll="0"
preserveOriginal="1"
types="wdfftypes.txt"
/>
<filter class="solr.NGramFilterFactory" minGramSize="3" maxGramSize="20"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory"
splitOnCaseChange="0"
splitOnNumerics="0"
stemEnglishPossessive="0"
generateWordParts="1"
generateNumberParts="1"
catenateWords="0"
catenateNumbers="0"
catenateAll="0"
preserveOriginal="1"
types="wdfftypes.txt"
/>
<filter class="solr.NGramFilterFactory" minGramSize="3" maxGramSize="20"/>
</analyzer>
</fieldType>
<fieldType name="text_exact" class="solr.TextField">
<analyzer>
<tokenizer class="solr.KeywordTokenizerFactory"/>
</analyzer>
</fieldType>
</types>
<fields>
<!-- Merchant Fields -->
<field name="id" type="int" indexed="true" stored="true" required="true"/>
<field name="site_id" type="int" indexed="true" stored="true" required="true"/>
<field name="title" type="text_lowercase_ngram" indexed="true" stored="true"/>
<field name="url" type="text_exact" indexed="true" stored="true"/>
<field name="keywords" type="text_lowercase_ngram" indexed="true" stored="true" />
<field name="description" type="text_lowercase_ngram" indexed="true" stored="true" />
<field name="type" type="int" indexed="true" stored="true"/>
<field name="popularity" type="int" indexed="true" stored="true"/>
<field name="category" type="text_exact" indexed="true" stored="true" multiValued="true"/>
</fields>
<!-- field to use to determine and enforce document uniqueness. -->
<uniqueKey>id</uniqueKey>
<!-- field for the QueryParser to use when an explicit fieldname is absent -->
<defaultSearchField>title</defaultSearchField>
<!-- SolrQueryParser configuration: defaultOperator="AND|OR" -->
<solrQueryParser defaultOperator="OR"/>
Searching 'deb' returns the matching document with a score of 6.4406505. Searching 'debe', 'deben', 'debenh' and 'debenha' returns no results. Searching 'debenham' returns the matching document with a score of 41.740173 and 'debenhams' returns the document with a score of 111.30711.
I have tried using the query analyzer which shows matching terms for each of the above queries, yet I am not seeing the matching document coming back in the results. Is there a way I can return ALL documents with corresponding scores regardless of whether they are a positive match or not in order to better understand why they are not being returned?
Upvotes: 1
Views: 1699
Reputation: 60195
First of all you should remove the NGramFilterFactory at query time. You really don't need to make ngrams of the query, and that's what is probably messing up your results. Also, is it possible that you are only looking the first ten results? Solr uses a default rows=10
parameter; you can increase it or switch page using the start
parameter. have a look at the numFound
returned with your query, which contains the total number of results, even if you don't see them all.
Upvotes: 1