Need to Search between words in elastic search Please advice how can I handle

Question

Hi I am looking for search function where we need to search between words in elastic search . Ryt now our search work like if we want search for “company name” we need to search with “c” , “n”,”co”,”comp”,”na”,”nam” but req is if we search with “mp”, “any”,”ame “, “me”, “p” result should give “company name”. Please advice how can we handle this is there any such type of search functionality, I tried wild card but it is not working for multiple fields please advise if I miss anything or suggest me how to achieve.

Bhavya · Accepted Answer

You can use N-gram tokenizer that first breaks text down into words whenever it encounters one of a list of specified characters, then it emits N-grams of each word of the specified length.

Adding a working example with index data, mapping, search query, and results.

Index Mapping:

{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "my_tokenizer"
        }
      },
      "tokenizer": {
        "my_tokenizer": {
          "type": "ngram",
          "min_gram": 1,
          "max_gram": 20,
          "token_chars": [
            "letter",
            "digit"
          ]
        }
      }
    },
    "max_ngram_diff": 50
  },
  "mappings": {
    "properties": {
      "body": {
        "type": "text",
        "analyzer": "my_analyzer"
      }
    }
  }
}

Analyze API

GET/_analzye
{
  "analyzer" : "my_analyzer",
  "text" : "company name"
}

Following tokens are generated

{
  "tokens": [
    {
      "token": "c",
      "start_offset": 0,
      "end_offset": 1,
      "type": "word",
      "position": 0
    },
    {
      "token": "co",
      "start_offset": 0,
      "end_offset": 2,
      "type": "word",
      "position": 1
    },
    {
      "token": "com",
      "start_offset": 0,
      "end_offset": 3,
      "type": "word",
      "position": 2
    },
    {
      "token": "comp",
      "start_offset": 0,
      "end_offset": 4,
      "type": "word",
      "position": 3
    },
    {
      "token": "compa",
      "start_offset": 0,
      "end_offset": 5,
      "type": "word",
      "position": 4
    },
    {
      "token": "compan",
      "start_offset": 0,
      "end_offset": 6,
      "type": "word",
      "position": 5
    },
    {
      "token": "company",
      "start_offset": 0,
      "end_offset": 7,
      "type": "word",
      "position": 6
    },
    {
      "token": "o",
      "start_offset": 1,
      "end_offset": 2,
      "type": "word",
      "position": 7
    },
    {
      "token": "om",
      "start_offset": 1,
      "end_offset": 3,
      "type": "word",
      "position": 8
    },
    {
      "token": "omp",
      "start_offset": 1,
      "end_offset": 4,
      "type": "word",
      "position": 9
    },
    {
      "token": "ompa",
      "start_offset": 1,
      "end_offset": 5,
      "type": "word",
      "position": 10
    },
    {
      "token": "ompan",
      "start_offset": 1,
      "end_offset": 6,
      "type": "word",
      "position": 11
    },
    {
      "token": "ompany",
      "start_offset": 1,
      "end_offset": 7,
      "type": "word",
      "position": 12
    },
    {
      "token": "m",
      "start_offset": 2,
      "end_offset": 3,
      "type": "word",
      "position": 13
    },
    {
      "token": "mp",
      "start_offset": 2,
      "end_offset": 4,
      "type": "word",
      "position": 14
    },
    {
      "token": "mpa",
      "start_offset": 2,
      "end_offset": 5,
      "type": "word",
      "position": 15
    },
    {
      "token": "mpan",
      "start_offset": 2,
      "end_offset": 6,
      "type": "word",
      "position": 16
    },
    {
      "token": "mpany",
      "start_offset": 2,
      "end_offset": 7,
      "type": "word",
      "position": 17
    },
    {
      "token": "p",
      "start_offset": 3,
      "end_offset": 4,
      "type": "word",
      "position": 18
    },
    {
      "token": "pa",
      "start_offset": 3,
      "end_offset": 5,
      "type": "word",
      "position": 19
    },
    {
      "token": "pan",
      "start_offset": 3,
      "end_offset": 6,
      "type": "word",
      "position": 20
    },
    {
      "token": "pany",
      "start_offset": 3,
      "end_offset": 7,
      "type": "word",
      "position": 21
    },
    {
      "token": "a",
      "start_offset": 4,
      "end_offset": 5,
      "type": "word",
      "position": 22
    },
    {
      "token": "an",
      "start_offset": 4,
      "end_offset": 6,
      "type": "word",
      "position": 23
    },
    {
      "token": "any",
      "start_offset": 4,
      "end_offset": 7,
      "type": "word",
      "position": 24
    },
    {
      "token": "n",
      "start_offset": 5,
      "end_offset": 6,
      "type": "word",
      "position": 25
    },
    {
      "token": "ny",
      "start_offset": 5,
      "end_offset": 7,
      "type": "word",
      "position": 26
    },
    {
      "token": "y",
      "start_offset": 6,
      "end_offset": 7,
      "type": "word",
      "position": 27
    },
    {
      "token": "n",
      "start_offset": 8,
      "end_offset": 9,
      "type": "word",
      "position": 28
    },
    {
      "token": "na",
      "start_offset": 8,
      "end_offset": 10,
      "type": "word",
      "position": 29
    },
    {
      "token": "nam",
      "start_offset": 8,
      "end_offset": 11,
      "type": "word",
      "position": 30
    },
    {
      "token": "name",
      "start_offset": 8,
      "end_offset": 12,
      "type": "word",
      "position": 31
    },
    {
      "token": "a",
      "start_offset": 9,
      "end_offset": 10,
      "type": "word",
      "position": 32
    },
    {
      "token": "am",
      "start_offset": 9,
      "end_offset": 11,
      "type": "word",
      "position": 33
    },
    {
      "token": "ame",
      "start_offset": 9,
      "end_offset": 12,
      "type": "word",
      "position": 34
    },
    {
      "token": "m",
      "start_offset": 10,
      "end_offset": 11,
      "type": "word",
      "position": 35
    },
    {
      "token": "me",
      "start_offset": 10,
      "end_offset": 12,
      "type": "word",
      "position": 36
    },
    {
      "token": "e",
      "start_offset": 11,
      "end_offset": 12,
      "type": "word",
      "position": 37
    }
  ]
}

Index Data:

{
  "body": "company name"
}

Search Query:

{
  "query": {
    "match": {
      "body": "ame"
    }
  }
}

Search Result:

"hits": [
      {
        "_index": "64975316",
        "_type": "_doc",
        "_id": "1",
        "_score": 1.941854,
        "_source": {
          "body": "company name"
        }
      }
    ]

Need to Search between words in elastic search Please advice how can I handle

Answers (1)

Related Questions