Pierre
Pierre

Reputation: 23

How can I accurate my ElasticSeach query to distingue better its results?

My chalenge here is to create a autocomplete field (django and ES), where I could search "apeni", "rua apen" or "roa apen" and have got "rua apeninos" as the main (or unique) option. I have already tried suggest and completion in ES, but both use prefix (don't work with "apen"). I tried wildcards as well, but couldn't use fuzzy (don't work with "roa apeni" or "apini"). So, now I am tring match with fuzzy.

But even when query term is differente, like "rua ape" or "rua apot", it returns the same two docs with street_desc equal "rua apeninos" and "rua apotribu" and both with score 1.0.

Query:

{
   "aggs":{
      "addresses":{
         "filters":{
            "filters":{
               "street":{
                  "match":{
                     "street_desc":{
                        "query":"rua ape",
                        "fuzziness":"AUTO",
                        "prefix_length":0,
                        "max_expansions":50
                     }
                  }
               }
            }
         },
         "aggs":{
            "street_bucket":{
               "significant_terms":{
                  "field":"street_desc.raw",
                  "size":3
               }
            }
         }
      }
   },
   "sort":[
      {
         "_score":{
            "order":"desc"
         }
      }
   ]
}

Index:

{
   "catalogs":{
      "mappings":{
         "properties":{
            "street_desc":{
               "type":"text",
               "fields":{
                  "raw":{
                     "type":"keyword"
                  }
               },
               "analyzer":"suggest_analyzer"
            }
         }
      }
   }
}

Analyzer: (python)

suggest_analyzer = analyzer(
    'suggest_analyzer',
    tokenizer=tokenizer("lowercase"),
    filter=[token_filter('stopbr', 'stop', stopwords="_brazilian_")],
    language="brazilian",
    char_filter=["html_strip"]
)

Upvotes: 1

Views: 30

Answers (1)

Amit
Amit

Reputation: 32386

Adding an end to end working example, which I tested on all the given search terms.

Index-mapping

{
  "settings": {
    "analysis": {
      "filter": {
        "autocomplete_filter": {
          "type": "edge_ngram",
          "min_gram": 1,
          "max_gram": 10
        }
      },
      "analyzer": {
        "autocomplete": { 
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "autocomplete_filter"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "autocomplete", 
        "search_analyzer": "standard" 
      }
    }
  }
}

Index sample docs

{
   "title" : "rua apotribu"
}

{
   "title" : "rua apeninos"
}

Search queries

{
    "query": {
        "match": {
            "title": {
                "query": "apeni", // 
                "fuzziness":"AUTO"
            }
        }
    }
}

And search result

  "hits": [
            {
                "_index": "64881760",
                "_type": "_doc",
                "_id": "1",
                "_score": 1.1026623,
                "_source": {
                    "title": "rua apeninos"
                }
            }
        ]

Now with apen also it gives search result

 "hits": [
            {
                "_index": "64881760",
                "_type": "_doc",
                "_id": "1",
                "_score": 2.517861,
                "_source": {
                    "title": "rua apeninos"
                }
            }
        ]

And now when query terms are different like rua apot, it brings both the docs with a much higher score to rua apotribu as shown in below search result.

 "hits": [
            {
                "_index": "64881760",
                "_type": "_doc",
                "_id": "2",
                "_score": 2.9289336,
                "_source": {
                    "title": "rua apotribu"
                }
            },
            {
                "_index": "64881760",
                "_type": "_doc",
                "_id": "1",
                "_score": 0.41107285,
                "_source": {
                    "title": "rua apeninos"
                }
            }
        ]

Upvotes: 1

Related Questions