Reputation: 100
I'm struggling with this, which I feel should work but maybe I'm doing something stupid. This search:
{
"query":
{
"bool":
{
"must":[
{"match":{"Element.sourceSystem.name":"Source1 Source2"}}
]
}
}
returns data for both Source1 and Source2. Adding a terms search, as underneath, I would expect to return a subset of the first search with just the Source1s returned. Nothing is returned, when run with the first query or on it's own.
{
"query":
{
"bool":
{
"must":[
{"match":{"Element.sourceSystem.name":"Source1 Source2"}},
{"terms":{"Element.sourceSystem.name":["Source1"]}}
]
}
}
}
I realise this is hard without seeing the documents, but suffice it to say that "Element.sourceSystem.name" exists and is available as the first search works fine - all input gratefully received.
Upvotes: 2
Views: 444
Reputation: 1234
There are some things that are handled differently in match
queries than in terms
queries.
First of all, a detour to analyzers:
Assuming you are using the standard analyzer of elasticsearch, which consists of a standard tokenizer and some token filters. The standard tokenizer will tokenize (split your text into terms) on spaces, punctuation marks and some other special characters. Details can be found in the Elasticsearch Documentation, so for now let's just say 'each word will be a term'.
The second, very important part of the analyzer is the lowercase filter. It will transform terms into lowercase. This means, later on, searching for Source1
and source1
should yield the same results.
So a short example:
Input : "This is my input text in English." will be analyzed and end up with the following terms: "this", "is", "my", "input", "text", "in", "english".
All of this happens when you index a document into a text
field for example. I assume the Element.sourceSystem.name
is one of this type, since your normal match query seems to work.
Now, when you issue a match query with "Source1 Source2"
, the analysis will also happen and transform it into tokens source1
and source2
. Internally it will then create 2 term queries in a boolean OR. So either source1
or source2
must match to be a result of your query.
By the way, the match query supports a
minimum_should_match
property. You could specify, how many terms of your match query need to match.
Here's now the clue with the terms query. It does not analyze the text you provide. It's usually supposed to be used on fields of type keyword
. Keyword fields are also not analyzed (for further information, please read the documentation of mapping types - it is actually pretty important). So what does this mean?
"this", "is", "my", "input", "text", "in", "english"
.English
will match, because it will be analyzed to english
English
will never match, because there is no term English
in my index. It is case sensitive.I am very positive, if you would use source1
in your terms query, it would match something. However, I highly doubt that your query is the way to go for your use case. Try using normal match queries when querying text fields and (in general - not always applicable) only use terms queries on keyword fields.
Upvotes: 3