Arjun Sankar
Arjun Sankar

Reputation: 190

Implementing search using Elasticsearch

I am currently implementing elasticsearch in my application. Please assume that "Hello World" is the data which we need to search. Our requirement is that we should get the result by entering "h" or "Hello World" or "Hello Worlds" as the keyword.

This is our current query.

{
"query": {
    "wildcard" : {
        "message" : {
            "title" : "h*"
        }
    }
}

}

By using this we are getting the right result using the keyword "h". But we need to get the results in case of small spelling mistakes also.

Upvotes: 2

Views: 83

Answers (2)

Amit
Amit

Reputation: 32376

You need to use english analyzer which stemmed tokens to its root form. More info can be found here

I implemented it by taking your example data, query and expected results using the edge n-gram analyzer and match query.

Index Mapping

{
  "settings": {
    "analysis": {
      "filter": {
        "autocomplete_filter": {
          "type": "edge_ngram",
          "min_gram": 1,
          "max_gram": 10
        }
      },
      "analyzer": {
        "autocomplete": { 
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "autocomplete_filter"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "autocomplete", 
        "search_analyzer": "english" 
      }
    }
  }
}

Index document

{
   "title" : "Hello World"
}

Search query for h and its result

{
  "query": {
    "match": {
      "title": "h"
    }
  }
}

 "hits": [
         {
            "_index": "so-60524477-partial-key",
            "_type": "_doc",
            "_id": "1",
            "_score": 0.42763555,
            "_source": {
               "title": "Hello World"
            }
         }
      ]

Search query for Hello Worlds and same document comes in result

{
  "query": {
    "match": {
      "title": "Hello worlds"
    }
  }
}

Result

"hits": [
         {
            "_index": "so-60524477-partial-key",
            "_type": "_doc",
            "_id": "1",
            "_score": 0.8552711,
            "_source": {
               "title": "Hello World"
            }
         }
      ]

Upvotes: 2

jaspreet chahal
jaspreet chahal

Reputation: 9099

EdgeNGrams or NGrams have better performance than wildcards. For wild card all documents have to be scanned to see which match the pattern. Ngrams break a text in small tokens. Ex Quick Foxes will stored as [ Qu, Qui, Quic, Quick, Fo, Fox, Foxe, Foxes ] depending on min_gram and max_gram size.

Fuzziness can be used to find similar terms

Mapping

PUT my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "my_tokenizer"
        }
      },
      "tokenizer": {
        "my_tokenizer": {
          "type": "edge_ngram",
          "min_gram": 1,
          "max_gram": 20,
          "token_chars": [
            "letter",
            "digit"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "text":{
        "type": "text",
        "analyzer": "my_analyzer"
      }
    }
  }
}

Query

GET my_index/_search
{
  "query": {
    "match": {
      "text": {
        "query": "hello worlds",
        "fuzziness": 1
      }
    }
  }
}

Upvotes: 1

Related Questions