Imran Azad
Imran Azad

Reputation: 1048

Confused about elasticsearch query

        POST http://localhost:9200/test2/drug?pretty
        {
          "title": "I can do this"
        }


        get test2/drug/_search
        {
          "query" : {
            "match": {
              "title": "cancer"
            }
          }
        }

The mappings are:

        {
           "test2": {
              "mappings": {
                 "drug": {
                    "properties": {
                       "title": {
                          "type": "string"
                       }
                    }
                 }
              }
           }
        }

Running the above query returns the document. I want to understand what elastic is doing behind the scenes? From looking at the output of the default analyzer it does not tokenize cancer such that it returns "can" so why is a document with the word "can" being returned and what is causing this to be returned? In other words, what other processing is happening to the search query "cancer".

Updated

Is there a command I can run on my box that will clear all indexes and everything so I have a clean slate? I ran delete /* which succeeded but still getting a match.

Upvotes: 2

Views: 101

Answers (2)

Andrei Stefan
Andrei Stefan

Reputation: 52368

The problem with your test is, if you are using Sense, the get request. In Sense it should be GET (capital letters).

The explanation is related to GET vs. POST http methods. Behind the scene Sense actually converts a GET request to a HTTP POST (given that many browsers do not support HTTP GET requests with a request body). This means that, even if you write GET, the actual http request is a POST.

Because Sense has the autocomplete that forces upper case letters for request methods, it uses the same upper case letters when deciding if it's a GET (and not a lowercase get) request together with a request body. If it is, then that request is transformed to a POST one. If it compares the request method and decides is not a GET it sends the request as is, meaning with a get method and with a body. Since the body is ignored, what reaches Elasticsearch will be a test2/drug/_search which is basically a match_all.

Upvotes: 1

G Quintana
G Quintana

Reputation: 4667

I guess that you configured in your index mappings an NGram filter or tokenizer. Let's suppose (I hope you'll confirm my hypothesis) an Edge NGram is configured. You can check it with:

GET test2/_mapping

Then the document is tokenized: i,c,ca,can,d,do,t,th,thi,this. As a result, in the index, the token can points to the document I can do this

When you're searching cancer, the tokens c,ca,can,canc,cance,cancer are produced by the same analysis chain, and then looked for in the index. As a result your document is found.

With the NGram filter, you often need to configure a different analyzer for search than for indexing, for instance:

  • index_analyzer/analyzer: standard + edge ngram
  • search_analyzer: stardand along

Then if you search can you'll find documents containing can,cancer,candy... But if you search cancer, you'll only find documents containing cancer,cancerology... and so on.

Upvotes: 0

Related Questions