RyanHirsch
RyanHirsch

Reputation: 1847

edge_ngram filter and not analzyed to match search

I have the following elastic search configuration:

PUT /my_index
{
    "settings": {
        "number_of_shards": 1, 
        "analysis": {
            "filter": {
                "autocomplete_filter": { 
                    "type":     "edge_ngram",
                    "min_gram": 1,
                    "max_gram": 20
                },
                "snow_filter" : {
                    "type" : "snowball",
                    "language" : "English"
                }
            },
            "analyzer": {
                "autocomplete": {
                    "type":      "custom",
                    "tokenizer": "standard",
                    "filter": [
                        "lowercase",
                        "snow_filter",
                        "autocomplete_filter" 
                    ]
                }
            }
        }
    }
}

PUT /my_index/_mapping/my_type
{
    "my_type": {
        "properties": {
            "name": {
                "type": "multi_field",
                "fields": {
                    "name": {
                        "type":            "string",
                        "index_analyzer":  "autocomplete", 
                        "search_analyzer": "snowball"
                    },
                    "not": {
                        "type": "string",
                        "index": "not_analyzed"
                    }
                }
            }
        }
    }
}


POST /my_index/my_type/_bulk
{ "index": { "_id": 1            }}
{ "name": "Brown foxes"    }
{ "index": { "_id": 2            }}
{ "name": "Yellow furballs" }
{ "index": { "_id": 3            }}
{ "name": "my discovery" }
{ "index": { "_id": 4            }}
{ "name": "myself is fun" }
{ "index": { "_id": 5            }}
{ "name": ["foxy", "foo"]    }
{ "index": { "_id": 6            }}
{ "name": ["foo bar", "baz"] }

I am trying to get a search to only return item 6 that has a name of "foo bar" and I am not quite sure how. This is what I am doing right now:

GET /my_index/my_type/_search
{
    "query": {
        "match": {
            "name": {
                "query":    "foo b"
            }
        }
    }
}

I know it's a combination of how the tokenizer is splitting the word but sort of lost on how both be flexible and be strict enough to match this. I am guessing I need to do a multiple field on my mapping of name, but I am not sure. How can I fix the query and/or my mapping to satisfy my needs?

Upvotes: 0

Views: 233

Answers (1)

Sloan Ahrens
Sloan Ahrens

Reputation: 8718

You're already close. Since your edge_ngram analyzer generates tokens of a minimum length of 1, and your query gets tokenized into "foo" and "b", and the default match query operator is "or", your query matches each document that has a term starting with "b" (or "foo"), three of the docs.

Using the "and" operator seems to do what you want:

POST /my_index/my_type/_search
{
    "query": {
        "match": {
            "name": {
                "query":    "foo b",
                "operator": "and"
            }
        }
    }
}
...
{
   "took": 1,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 1.4451914,
      "hits": [
         {
            "_index": "test_index",
            "_type": "my_type",
            "_id": "6",
            "_score": 1.4451914,
            "_source": {
               "name": [
                  "foo bar",
                  "baz"
               ]
            }
         }
      ]
   }
}

Here's the code I used to test it:

http://sense.qbox.io/gist/4f6fb7c1fdc6942023091ee1433d7490e04e7dea

Upvotes: 1

Related Questions