Reputation: 9028
My intent is to search for a phrase against multiple fields.
{
"multi_match" : {
"query" : "king of baro",
"fields" : [ "filed1", "filed2", "filed3","filed5^9","filed6",filed7^9"],
"type" : "phrase_prefix",
"boost" : 10.0,
"tie_breaker" : 0.0
}
}
The above query returns, "king of baroda" and it works as expected.
But, when i search for "king of bar", it doesn't return anything.
{
"multi_match" : {
"query" : "king of bar",
"fields" : [ "filed1", "filed2", "filed3","filed5^9","filed6",filed7^9"],
"type" : "phrase_prefix",
"boost" : 10.0,
"tie_breaker" : 0.0
}
}
Summary,
Search for "king of bar" - No result
Search for "king of baro" - returns "king of baroda"
Search for "king of baroda" - returns "king of baroda"
Is there any configuration I am missing?
Mapping file :-
http://localhost:9200/sec/_mapping/
{
"sec":{
"mappings":{
"sec":{
"properties":{
"filed1":{
"type":"string"
},
"filed2":{
"type":"string"
},
"filed3":{
"type":"string"
},
"filed4":{
"type":"string"
},
"filed5":{
"type":"string"
},
"filed6":{
"type":"string"
},
"filed7":{
"type":"string"
}
}
}
}
}
}
Analyzer, from elasticsearch.yml
:
index:
analysis:
analyzer:
security_edge_ngram_analyzer:
alias: [security_edge_ngram_analyzer]
tokenizer: security_edge_ngram_tokenizer
tokenizer:
security_edge_ngram_tokenizer:
type: edgeNGram
Upvotes: 0
Views: 883
Reputation: 52368
First, I would double check that my custom analyzer is working as expected. They way I do this is to use fielddata_fields
:
GET sec/sec/_search
{
"fielddata_fields": ["filed1","field2","filed3","field4","filed5","field6","filed7"]
}
A proper edgeNGram
setup would result in an output like this:
"fields": {
"filed1": [
"ki",
"kin",
"king",
"king ",
"king o",
"king of",
"king of ",
"king of b",
"king of ba",
"king of bar",
"king of baro",
"king of barod",
"king of baroda"
]
}
If you don't see something similar, then I'd look how the analyzer is setup and if its configuration is ok. As a second way of checking this, I'd create a simple test index where I would set the custom analyzer directly on a field and test that the same as above:
PUT /sec
{
"mappings": {
"sec": {
"properties": {
"filed1": {
"type": "string",
"analyzer": "security_edge_ngram_analyzer"
}
}
}
},
"settings": {
"analysis": {
"analyzer": {
"security_edge_ngram_analyzer": {
"tokenizer": "security_edge_ngram_tokenizer"
}
},
"tokenizer": {
"security_edge_ngram_tokenizer": {
"type": "edgeNGram",
"min_gram": 2,
"max_gram": 20
}
}
}
}
}
Upvotes: 1
Reputation: 8718
My guess would be that you have your edge ngram tokenizer configured with min_gram
set to 4
, though it's hard to tell for sure without seeing the configuration.
Here's an example of how I set up an edge ngram analyzer on a per-field basis in this blog post for Qbox:
PUT /test_index
{
"settings": {
"analysis": {
"filter": {
"edge_ngram_filter": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 20
}
},
"analyzer": {
"edge_ngram_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"edge_ngram_filter"
]
}
}
}
},
"mappings": {
"doc": {
"properties": {
"text_field": {
"type": "string",
"index_analyzer": "edge_ngram_analyzer",
"search_analyzer": "standard"
}
}
}
}
}
Upvotes: 2