user1578872
user1578872

Reputation: 9028

Elastic Search - Multi match - phrase search

My intent is to search for a phrase against multiple fields.

{
  "multi_match" : {
    "query" : "king of baro",
    "fields" : [ "filed1", "filed2", "filed3","filed5^9","filed6",filed7^9"],
    "type" : "phrase_prefix",
    "boost" : 10.0,
    "tie_breaker" : 0.0
  }
}

The above query returns, "king of baroda" and it works as expected.

But, when i search for "king of bar", it doesn't return anything.

{
      "multi_match" : {
        "query" : "king of bar",
        "fields" : [ "filed1", "filed2", "filed3","filed5^9","filed6",filed7^9"],
        "type" : "phrase_prefix",
        "boost" : 10.0,
        "tie_breaker" : 0.0
      }
    }

Summary,

Search for "king of bar"  - No result
Search for "king of baro"  - returns "king of baroda"
Search for "king of baroda"  - returns "king of baroda"

Is there any configuration I am missing?

Mapping file :-

http://localhost:9200/sec/_mapping/

{  
   "sec":{  
      "mappings":{  
         "sec":{  
            "properties":{  
               "filed1":{  
                  "type":"string"
               },
               "filed2":{  
                  "type":"string"
               },
               "filed3":{  
                  "type":"string"
               },
               "filed4":{  
                  "type":"string"
               },
               "filed5":{  
                  "type":"string"
               },
               "filed6":{  
                  "type":"string"
               },
               "filed7":{  
                  "type":"string"
               }
            }
         }
      }
   }
}

Analyzer, from elasticsearch.yml:

index:
  analysis:
    analyzer:

      security_edge_ngram_analyzer:
          alias: [security_edge_ngram_analyzer]
          tokenizer: security_edge_ngram_tokenizer

    tokenizer:
      security_edge_ngram_tokenizer:
        type: edgeNGram

Upvotes: 0

Views: 883

Answers (2)

Andrei Stefan
Andrei Stefan

Reputation: 52368

First, I would double check that my custom analyzer is working as expected. They way I do this is to use fielddata_fields:

GET sec/sec/_search
{
  "fielddata_fields": ["filed1","field2","filed3","field4","filed5","field6","filed7"]
}

A proper edgeNGram setup would result in an output like this:

        "fields": {
           "filed1": [
              "ki",
              "kin",
              "king",
              "king ",
              "king o",
              "king of",
              "king of ",
              "king of b",
              "king of ba",
              "king of bar",
              "king of baro",
              "king of barod",
              "king of baroda"
           ]
        }

If you don't see something similar, then I'd look how the analyzer is setup and if its configuration is ok. As a second way of checking this, I'd create a simple test index where I would set the custom analyzer directly on a field and test that the same as above:

PUT /sec
{
  "mappings": {
    "sec": {
      "properties": {
        "filed1": {
          "type": "string",
          "analyzer": "security_edge_ngram_analyzer"
        }
      }
    }
  },
  "settings": {
    "analysis": {
      "analyzer": {
        "security_edge_ngram_analyzer": {
          "tokenizer": "security_edge_ngram_tokenizer"
        }
      },
      "tokenizer": {
        "security_edge_ngram_tokenizer": {
          "type": "edgeNGram",
          "min_gram": 2,
          "max_gram": 20
        }
      }
    }
  }
}

Upvotes: 1

Sloan Ahrens
Sloan Ahrens

Reputation: 8718

My guess would be that you have your edge ngram tokenizer configured with min_gram set to 4, though it's hard to tell for sure without seeing the configuration.

Here's an example of how I set up an edge ngram analyzer on a per-field basis in this blog post for Qbox:

PUT /test_index
{
   "settings": {
      "analysis": {
         "filter": {
            "edge_ngram_filter": {
               "type": "edge_ngram",
               "min_gram": 2,
               "max_gram": 20
            }
         },
         "analyzer": {
            "edge_ngram_analyzer": {
               "type": "custom",
               "tokenizer": "standard",
               "filter": [
                  "lowercase",
                  "edge_ngram_filter"
               ]
            }
         }
      }
   },
   "mappings": {
      "doc": {
         "properties": {
            "text_field": {
               "type": "string",
               "index_analyzer": "edge_ngram_analyzer",
               "search_analyzer": "standard"
            }
         }
      }
   }
}

Upvotes: 2

Related Questions