Fewster
Fewster

Reputation: 100

ElasticSearch multiple exact search on field returns no results

I'm struggling with this, which I feel should work but maybe I'm doing something stupid. This search:

{
   "query":
   {
     "bool":
     {
       "must":[
         {"match":{"Element.sourceSystem.name":"Source1 Source2"}}
       ]
  }
}

returns data for both Source1 and Source2. Adding a terms search, as underneath, I would expect to return a subset of the first search with just the Source1s returned. Nothing is returned, when run with the first query or on it's own.

{
  "query":
  {
    "bool":
    {
      "must":[
        {"match":{"Element.sourceSystem.name":"Source1 Source2"}},
        {"terms":{"Element.sourceSystem.name":["Source1"]}}
      ]
    }
  }
}

I realise this is hard without seeing the documents, but suffice it to say that "Element.sourceSystem.name" exists and is available as the first search works fine - all input gratefully received.

Upvotes: 2

Views: 444

Answers (1)

Slomo
Slomo

Reputation: 1234

There are some things that are handled differently in match queries than in terms queries.

First of all, a detour to analyzers:

Assuming you are using the standard analyzer of elasticsearch, which consists of a standard tokenizer and some token filters. The standard tokenizer will tokenize (split your text into terms) on spaces, punctuation marks and some other special characters. Details can be found in the Elasticsearch Documentation, so for now let's just say 'each word will be a term'.

The second, very important part of the analyzer is the lowercase filter. It will transform terms into lowercase. This means, later on, searching for Source1 and source1 should yield the same results.

So a short example:

Input : "This is my input text in English." will be analyzed and end up with the following terms: "this", "is", "my", "input", "text", "in", "english".

All of this happens when you index a document into a text field for example. I assume the Element.sourceSystem.name is one of this type, since your normal match query seems to work.

Now, when you issue a match query with "Source1 Source2", the analysis will also happen and transform it into tokens source1 and source2. Internally it will then create 2 term queries in a boolean OR. So either source1 or source2 must match to be a result of your query.

By the way, the match query supports a minimum_should_match property. You could specify, how many terms of your match query need to match.

Here's now the clue with the terms query. It does not analyze the text you provide. It's usually supposed to be used on fields of type keyword. Keyword fields are also not analyzed (for further information, please read the documentation of mapping types - it is actually pretty important). So what does this mean?

  • If I take my example from above, my index would contain "this", "is", "my", "input", "text", "in", "english".
  • A match query with English will match, because it will be analyzed to english
  • A term/s query with English will never match, because there is no term English in my index. It is case sensitive.

I am very positive, if you would use source1 in your terms query, it would match something. However, I highly doubt that your query is the way to go for your use case. Try using normal match queries when querying text fields and (in general - not always applicable) only use terms queries on keyword fields.

Upvotes: 3

Related Questions