Reputation: 2316
So the reason for that is we have an API that receives query parameters from the client and builds an Elasticsearch query. However, depending on the type of user (whether it's financial adviser or investor and etc.) we have to apply more conditions in order to restrict the search. Unfortunately we can't make any change to the structure of the index (i.e. adding extra columns) and that's because the index is not managed by us and our API has no information about the index except the column names that are configurable.
So here is the example. A request is received to search based on 'investorDateOfBirth
' and 'financialAdviserId
' and because the search is coming from an adviser we are adding this condition programmatically:
financialAdviserId must be '123' (the id of the current user)
So the final query becomes:
{
"bool" : {
"must" : [
{
"term" : {
"financialAdviserId" : {
"value" : "123",
"boost" : 1.0
}
}
}
],
"should" : [
{
"term" : {
"investorDateOfBirth" : {
"value" : "1987-11-12",
"boost" : 1.0
}
}
},
{
"term" : {
"financialAdviserId" : {
"value" : "123",
"boost" : 1.0
}
}
}
],
"disable_coord" : false,
"adjust_pure_negative" : true,
"boost" : 1.0
}
}
As you can see, there are 2 'financialAdviserId', one is built programmatically from the request query parameters and one ('must') is added based on the current user but as you know this will return those with the specified investorDateOfBirth
as well as all other items with adviser id as 123 (including those that don't have the same DOB)
So imagine there are 3 records in the index:
| investorDateOfBirth | financialAdviserId | investorId |
| "1987-11-12" | 123 | 111 |
| "1900-11-12" | 123 | 222 |
| "1900-11-12" | 123 | 333 |
For the above query, the result is all the 3 rows and that's not the result we want, however, for the following query it returns only the first row which is the expectation:
{
"bool" : {
"must" : [
{
"term" : {
"financialAdviserId" : {
"value" : "123",
"boost" : 1.0
}
}
}
],
"should" : [
{
"term" : {
"investorDateOfBirth" : {
"value" : "1987-11-12",
"boost" : 1.0
}
}
}
],
"disable_coord" : false,
"adjust_pure_negative" : true,
"boost" : 1.0
}
}
How to tackle this problem? How can we change the first query to get the same result as the second query (which is returning the first row).
Just to let you know, we can't make financialAdviserId
not searchable, because there are other entities that can search through those column? Is there a way to create a subset (in our case a subset where financialAdviserId is 123) and then execute the requested query from the client against that subset?
We are using Elasticsearch v5.5.3
in Java 8
Upvotes: 1
Views: 960
Reputation: 6066
You're almost there. To get the expected behavior you may nest one bool
into the other:
{
"bool": {
"must": [
{
"term": {
"financialAdviserId": {
"value": "123"
}
}
},
{
"bool": {
"should": [
{
"term": {
"investorDateOfBirth": {
"value": "1987-11-12"
}
}
},
{
"term": {
"financialAdviserId": {
"value": "123"
}
}
}
]
}
}
]
}
(I removed the boost
s and other details to make the idea more clear.)
Now let me explain why the initial query does not work.
You used must
and should
in the same instance of bool
query. The documented behavior in this case is the following:
should
If the
bool
query is in a query context and has amust
orfilter
clause then a document will match the bool query even if none of the should queries match.
(This is also why the suggestion to use filter
from Federico does not solve the issue.)
So actually the query you applied had the following logical meaning:
query_restricting_set_of_docs AND (user_query or True)
And you were looking for this instead:
query_restricting_set_of_docs AND user_query
In your case user_query
appears to look like this:
query_restricting_set_of_docs OR some_other_query
which brings us the final expression:
query_restricting_set_of_docs AND (
query_restricting_set_of_docs OR some_other_query
)
which translates into ES bool
query like this:
{
"bool": {
"must": [
{
...query_restricting_set_of_docs
},
{
"bool": {
"should": [
{
...query_restricting_set_of_docs
},
{
...other_query
}
]
}
}
]
}
}
The main difference between filter and query context are:
The caching part will make searches faster, though without the relevance score you won't be able to show more relevant documents first. In your case you would probably like to put the query_restricting_set_of_docs
into the filter context.
To do so you might use the following query:
{
"bool": {
"must": [
{
"bool": {
"filter": [
{
"term": {
"financialAdviserId": {
"value": "123"
}
}
}
]
}
},
{
"bool": {
"should": [
{
"term": {
"investorDateOfBirth": {
"value": "1987-11-12"
}
}
},
{
"term": {
"financialAdviserId": {
"value": "123"
}
}
}
]
}
}
]
}
}
Here we wrap query_restricting_set_of_docs
into another bool
with filter
thus achieving filter context for the filtering part.
In cases when you have control over your index and you have few different subsets of the index which you want to restrict you may use Filtered Aliases, which will basically add specified filter
to all the queries executed against that alias.
Hope that helps!
Upvotes: 3