webness
webness

Reputation: 55

Misspelling suggestion ("did you mean") with phrase suggest and whitespace correction with Elasticsearch

I use default analyzer "english" for searching documents and it is pretty good. But also I need "did you mean" results when search query is misspelled OR search by such misspelled prhases.

What analyzers/filters/query do I need to achieve such behaveour?

Source text

Elasticsearch is a distributed, open source search and analytics engine for all types of data,
including textual, numerical, geospatial, structured, and unstructured. Elasticsearch is built
on Apache Lucene and was first released in 2010 by Elasticsearch N.V. (now known as Elastic).
Known for its simple REST APIs, distributed nature, speed, and scalability, Elasticsearch is
the central component of the Elastic Stack, a set of open source tools for data ingestion,
enrichment, storage, analysis, and visualization. Commonly referred to as the ELK Stack 
(after Elasticsearch, Logstash, and Kibana), the Elastic Stack now includes a rich collection
of lightweight shipping agents known as Beats for sending data to Elasticsearch.

Search terms

search query => did you mean XXX?

missed letter or something like
Elastisearch => Elasticsearch
distribated => distributed
Apacje => Apache

extra space
Elastic search => Elasticsearch

no space
opensource => open source

misspelled phrase
serach engne => search engine

Upvotes: 1

Views: 1587

Answers (1)

Amit
Amit

Reputation: 32376

Your first example of missed letter or something else can be achieved using the fuzzy query and second one using the custom analyzer which uses ngram or edge-ngram tokenizer for examples on it, please refer to my blog on autocomplete.

Adding fuzzy query example on your sample doc

Index mapping

{
    "mappings": {
        "properties": {
            "title": {
                "type": "text"
    
            }
        }
    }
}

Index your sample docs and use below search queries

{
    "query": {
        "fuzzy": {
            "title": {
                "value": "distributed"
            }
        }
    }
}

And search res

 "hits": [
            {
                "_index": "didyou",
                "_type": "_doc",
                "_id": "2",
                "_score": 0.89166296,
                "_source": {
                    "title": "distribated"
                }
            }
        ]

And for Elasticsearch

{
    "query": {
        "fuzzy": {
            "title": {
                "value": "Elasticsearch"
            }
        }
    }
}

And search Result

  "hits": [
            {
                "_index": "didyou",
                "_type": "_doc",
                "_id": "1",
                "_score": 0.8173577,
                "_source": {
                    "title": "Elastisearch"
                }
            }
        ]

Upvotes: 2

Related Questions