Rahul
Rahul

Reputation: 199

Elasticsearch prefix query

How to do a prefix search in Elasticsearch ?

For example if I have the following indexed documents:

[{
        "id": "1",
        "key": "abc",
        "foo": [1, 2, 3]
    },
    {
        "id": "2",
        "key": "ab",
        "foo": [4]
    },
    {
        "id": "3",
        "key": "xyz",
        "foo": [9, 10]
    },
    {
        "id": "4",
        "key": "abcd",
        "foo": [12]
    }
]

Now I want to have a query on attribute "key" with value "abcdef".

I expect the following documents to match the query.

document id matched reason
"1" YES "abc" is a prefix of "abcdef"
"2" YES "ab" is a prefix of "abcdef"
"3" NO "xyz" is not a prefix of "abcdef"
"4" YES "abcd" is a prefix of "abcdef"

Upvotes: 0

Views: 1661

Answers (2)

Val
Val

Reputation: 217294

Using an edge-ngram tokenizer (or token filter) is correct but you should only apply it at search time since the documents already contain the prefixes you're searching.

So you're index settings and mappings should look like this:

PUT test
{
  "settings": {
    "analysis": {
      "analyzer": {
        "prefix_analyzer": {
          "tokenizer": "prefix_tokenizer"
        }
      },
      "tokenizer": {
        "prefix_tokenizer": {
          "type": "edge_ngram",
          "min_gram": 2,
          "max_gram": 6,
          "token_chars": [
            "letter",
            "digit"
          ]
        }
      }
    },
    "index": {
      "max_ngram_diff": 10
    }
  },
  "mappings": {
    "properties": {
      "key": {
        "type": "text",
        "analyzer": "keyword",
        "search_analyzer": "prefix_analyzer"
      }
    }
  }
}

Then, your search query can look like this:

POST test/_search
{
  "query": {
    "match": {
      "key": "abcdef"
    }
  }
}

What is going to happen is that the input token abcdef will get tokenized into:

  • ab
  • abc
  • abcd
  • abcde
  • abcdef

In the results you'll get:

  • The first token will match document 2
  • The second token will match document 1
  • The third token will match document 4

Upvotes: 1

Bhavya
Bhavya

Reputation: 16172

You can use edge_ngram tokenizer to query on attribute "key" with value "abcdef"

Adding a working example

Index Mapping:

{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "my_tokenizer"
        }
      },
      "tokenizer": {
        "my_tokenizer": {
          "type": "edge_ngram",
          "min_gram": 2,
          "max_gram": 6,
          "token_chars": [
            "letter",
            "digit"
          ]
        }
      }
    },
    "max_ngram_diff": 10
  },
  "mappings": {
    "properties": {
      "key": {
        "type": "text",
        "analyzer": "my_analyzer"
      }
    }
  }
}

Search Query:

{
  "query": {
   "match":{
     "key":"abcdef"
   }
  }
}

Search Result:

"hits": [
      {
        "_index": "67419529",
        "_type": "_doc",
        "_id": "4",
        "_score": 1.8710749,
        "_source": {
          "id": "4",
          "key": "abcd",
          "foo": [
            12
          ]
        }
      },
      {
        "_index": "67419529",
        "_type": "_doc",
        "_id": "1",
        "_score": 1.0498221,
        "_source": {
          "id": "1",
          "key": "abc",
          "foo": [
            1,
            2,
            3
          ]
        }
      },
      {
        "_index": "67419529",
        "_type": "_doc",
        "_id": "2",
        "_score": 0.44839138,
        "_source": {
          "id": "2",
          "key": "ab",
          "foo": [
            4
          ]
        }
      }
    ]

Upvotes: 0

Related Questions