ketan
ketan

Reputation: 2904

Elasticsearch dsl OR query formation

I have index with multiple documents. The documents contains below fields:

I want to create a elasticsearch dsl query. For this query two inputs are available like adhar_number and pan_number. This query should match OR Condition on this.

Example: If one document contains provided adhar_number only then I want that document too.

I have one dictionary with below contents (my_dict):

{
  "adhar_number": "123456789012",
  "pan_number": "BGPPG4315B"
}

I tried like below:

from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search

es = Elasticsearch([{'host': 'localhost', 'port': 9200}])
s = Search(using=es, index="my_index")
for key, value in my_dict.items():
   s = s.query("match", **{key:value})

print(s.to_dict())
response = s.execute()
print(response.to_dict())

It creates below query:

{
  'query': {
    'bool': {
      'must': [
        {
          'match': {
            'adhar_number': '123456789012'
          }
        },
        {
          'match': {
            'pan_number': 'BGPPG4315B'
          }
        }
      ]
    }
  }
}

Above code is providing me the result with AND condition instead of OR Condition.

Please suggest me the good suggestions to include OR Condition.

Upvotes: 4

Views: 16953

Answers (2)

ifo20
ifo20

Reputation: 788

To fix the ES query itself, all you need to do is use 'should' instead of 'must':

{
  'query': {
    'bool': {
      'should': [
        {
          'match': {
            'adhar_number': '123456789012'
          }
        },
        {
          'match': {
            'pan_number': 'BGPPG4315B'
          }
        }
      ]
    }
  }
}

To achieve this in python, see the following example from the docs. The default logic is AND, but you can override it to OR as shown below.

Query combination Query objects can be combined using logical operators:

Q("match", title='python') | Q("match", title='django')
# {"bool": {"should": [...]}}

Q("match", title='python') & Q("match", title='django')
# {"bool": {"must": [...]}}

~Q("match", title="python")
# {"bool": {"must_not": [...]}} 

When you call the .query() method multiple times, the & operator will be used internally:

s = s.query().query() print(s.to_dict())
# {"query": {"bool": {...}}}

If you want to have precise control over the query form, use the Q shortcut to directly construct the combined query:

q = Q('bool',
    must=[Q('match', title='python')],
    should=[Q(...), Q(...)],
    minimum_should_match=1 ) 
s = Search().query(q)

So you want something like

q = Q('bool', should=[Q('match', **{key:value})])

Upvotes: 13

thisismydesign
thisismydesign

Reputation: 25102

You can use should as also mentioned by @ifo20. Note that you most likely want ot define the minimum_should_match parameters as well:

You can use the minimum_should_match parameter to specify the number or percentage of should clauses returned documents must match.

If the bool query includes at least one should clause and no must or filter clauses, the default value is 1. Otherwise, the default value is 0.

{
  'query': {
    'bool': {
      'should': [
        {
          'match': {
            'adhar_number': '123456789012'
          }
        },
        {
          'match': {
            'pan_number': 'BGPPG4315B'
          }
        }
      ],
      "minimum_should_match" : 1
    }
  }
}

Note also that the should clause contributes to the final score. I don't know how to avoid this but you may not want this to be part of an OR logic.

Upvotes: 0

Related Questions