Reputation: 813
Query: mpn:"MEM-CF-512MB-AOK"
Solr response:
{
"responseHeader": {
"status": 0,
"QTime": 1,
"params": {
"fl": "id, mpn, name",
"indent": "true",
"q": "mpn:\"MEM-CF-512MB-AOK\"",
"_": "1375801439480",
"wt": "json"
}
},
"response": {
"numFound": 2,
"start": 0,
"docs": [
{
"id": "1340120",
"mpn": "MEM-CF-256MB-AOK",
"name": "256MB CompactFlash"
},
{
"id": "1340129",
"mpn": "MEM-CF-512MB-AOK",
"name": "512MB CompactFlash"
}
]
},
"spellcheck": {
"suggestions": [
"correctlySpelled",
true
]
}
}
expected:
{
"id": "1340129",
"mpn": "MEM-CF-512MB-AOK",
"name": "512MB CompactFlash"
}
I need search:
1)MEM-CF-512MB-AOK
2)MEM-CF-512MB
3)MEM-CF-512MB-AO
4)M-CF-512MB-AOK
5) -CF-512MB-AOK
schema.xml:
<field name="mpn" type="text_general_edge_ngram" indexed="true" stored="true"/>
<fieldType name="text_general_edge_ngram" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.LowerCaseTokenizerFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="50" side="front"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.LowerCaseTokenizerFactory"/>
</analyzer>
</fieldType>
Upvotes: 1
Views: 330
Reputation: 33341
LowercaseTokenizer
is functionality equivalent to a LetterTokenizer
and LowercaseFilter
. Judging by the case you've provided, you don't want LetterTokenizer
-like functionality, which will only index consecutive sets of letters. Effectively, before the Ngramming, you have the tokens:
mem, cf, mb, aok
I think what you want is a KeywordTokenizer
and LowercaseFilter
Since you want to be able to search with missing characters at the end as well as the beginning, you need to perform a prefix query. An EdgeNgramTokenizer only produces NGrams taking characters off the front, such as:
mem-cf-512mb-aok, em-cf-512mb-aok, m-cf-512mb-aok, -cf-512mb-aok
So, to pick up matches with missing characters at the end, a simple prefix search should work, like:
m-cf-512mb-a*
minGramSize="1"
is almost certainly overzealous. You don't likely want 1-grams (ie. matching just "k"
). Your minimal case above would is 12 in length, for instance. I'll guess 5 for a reasonable min gram size.
<analyzer type="index">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowercaseFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="5" maxGramSize="50" side="front"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowercaseFilterFactory"/>
</analyzer>
And again, you should use queries appended with a trailing wildcard.
Upvotes: 2
Reputation: 9049
The scenario you've described looks like an exact match on mpn
field.
However, you've defined mpn
as Edge-NGram with mingram=1. This will start indexing 1-gram onwards. Which isn't what you would need, I imagine.
In order to get this sorted, I guess you could have another field (if you want NGram for another reason ) and have your exact query match against it. Ex
mpn_exact:"MEM-CF-512MB-AOK"
You could test this out by using the Analysis component of your Admin console.
Upvotes: 0