Reputation: 190
I am currently implementing elasticsearch in my application. Please assume that "Hello World" is the data which we need to search. Our requirement is that we should get the result by entering "h" or "Hello World" or "Hello Worlds" as the keyword.
This is our current query.
{
"query": {
"wildcard" : {
"message" : {
"title" : "h*"
}
}
}
}
By using this we are getting the right result using the keyword "h". But we need to get the results in case of small spelling mistakes also.
Upvotes: 2
Views: 83
Reputation: 32376
You need to use english analyzer which stemmed tokens to its root form. More info can be found here
I implemented it by taking your example data, query and expected results using the edge n-gram analyzer and match query.
{
"settings": {
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 10
}
},
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"autocomplete_filter"
]
}
}
}
},
"mappings": {
"properties": {
"title": {
"type": "text",
"analyzer": "autocomplete",
"search_analyzer": "english"
}
}
}
}
{
"title" : "Hello World"
}
h
and its result{
"query": {
"match": {
"title": "h"
}
}
}
"hits": [
{
"_index": "so-60524477-partial-key",
"_type": "_doc",
"_id": "1",
"_score": 0.42763555,
"_source": {
"title": "Hello World"
}
}
]
Hello Worlds
and same document comes in result{
"query": {
"match": {
"title": "Hello worlds"
}
}
}
Result
"hits": [
{
"_index": "so-60524477-partial-key",
"_type": "_doc",
"_id": "1",
"_score": 0.8552711,
"_source": {
"title": "Hello World"
}
}
]
Upvotes: 2
Reputation: 9099
EdgeNGrams or NGrams have better performance than wildcards. For wild card all documents have to be scanned to see which match the pattern. Ngrams break a text in small tokens. Ex Quick Foxes will stored as [ Qu, Qui, Quic, Quick, Fo, Fox, Foxe, Foxes ] depending on min_gram and max_gram size.
Fuzziness can be used to find similar terms
Mapping
PUT my_index
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "my_tokenizer"
}
},
"tokenizer": {
"my_tokenizer": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 20,
"token_chars": [
"letter",
"digit"
]
}
}
}
},
"mappings": {
"properties": {
"text":{
"type": "text",
"analyzer": "my_analyzer"
}
}
}
}
Query
GET my_index/_search
{
"query": {
"match": {
"text": {
"query": "hello worlds",
"fuzziness": 1
}
}
}
}
Upvotes: 1