labourday
labourday

Reputation: 81

Searching Nested Document

I am having trouble searching nested documents using the elasticsearch_dsl and elasticsearch libraries in Python.

I can successfully perform searches on top-level (i.e. non-nested) parts of the documents, but all of my attempts to search the nested portions fail for one reason or another.

I have scoured StackOverflow & the web for a definitive guide to searching nested documents using Python, but keep coming up short.

Here is a sample document that I am using:

{"username": "nancy",
"codeData": [
 {"code": "B1", "order": "2"}, 
 {"code": "L4", "order": "1"}
  ] 
}

I have 7 documents in an index, which I have mapped like this:

request_body = {
    "settings" : {
        "number_of_shards": 5,
        "number_of_replicas": 1
    },

    'mappings': {
        'testNesting': {
            'properties': {
                'username': {'type': 'text'},
                'codeData': {'type': 'nested',
                                  'properties' :{
                                      "code" : {"type":"text"},
                                      "order" :{"type":"text"}
                                      }
                                    }
                                 }
            }
        }
    }
es.indices.create(index = "nest-test6", body = request_body)

Performing the following search works correctly:

s = Search(using = es).query("match", username = "nancy")
response = s.execute()
print(response.to_dict())

Now, I want to try searching for documents that have code = "B1" within "codeData".

I have listed the sources that I've tried to use at the bottom of this question. My hope is that this can become a definitive guide that people can reference when trying to query nested documents using Python.

Here is what I have tried so far:

q = Q("match", code = "L4")
s = Search(using = es, index = "nest-test6").query("nested", path = "codeData", query = q)

Above results in a Transport Error (400, failed to create query), and then lists the query itself with a bunch of \n after each item.

q = Q("match", **{"codeData.code"" : "L4"})
s = Search(using = es, index = "nest-test6").query("nested", path = "codeData", query = q)

Above results in a syntax error on line 1.

s = Search(using = es, index = "nest-test6").query("nested", path = "lithologyData", query = **Q{"match":{ "lithology":"L4"}})

Above results in a syntax error as well.

I've tried several other approaches - but changed my data structure, and so listing them here won't make sense in the context of the above document.

I have no idea how to go about querying these nested objects. There are several pieces of information I feel I am missing:

  1. What are the Q/F keywords, and how to I use them?
  2. I understand that I have to specify the path to the queried term by using level1.nameOfObjectBeingQueried - given this is not a suitable keyword in the Python libraries, how do I handle it?

If there are any other sources I am missing, I would really appreciate someone pointing me towards them!

Additional Attempts Which Failed

s1 = Search(using = es).query("match", username = "nancy")
q1 = Q("match", lithologyData__lithology = "L4")
q2 = Q("match", **{"lithologyData.lithology":"L4"})
s2 = Search(using = es, index = "nest-test6").query("nested", path = "lithologyData", query = Q("match",lithologyData__lithology="L4"))
s3 = Search(using = es, index = "nest-test6").query("nested", path = "lithologyData", query = q1)
s4 = Search(using = es, index = "nest-test6").query("nested", path = "lithologyData", query = q2)
response = s1.execute()
response2 = s2.execute()
response3 = s3.execute()
response4 = s4.execute()

Response 1: Works

Response 2: Fails with:

TransportError(400, u'search_phase_execution_exception', u'failed to create query: {\n  "nested" : {\n    "query" : {\n      "match" : {\n        "codeData.code" : {\n          "query" : "L4",\n          "operator" : "OR",\n          "prefix_length" : 0,\n          "max_expansions" : 50,\n          "fuzzy_transpositions" : true,\n          "lenient" : false,\n          "zero_terms_query" : "NONE",\n          "auto_generate_synonyms_phrase_query" : true,\n          "boost" : 1.0\n        }\n      }\n    },\n    "path" : "codeData",\n    "ignore_unmapped" : false,\n    "score_mode" : "avg",\n    "boost" : 1.0\n  }\n}')

Response 3: Fails with:

TransportError(400, u'search_phase_execution_exception', u'failed to create query: {\n  "nested" : {\n    "query" : {\n      "match" : {\n        "codeData.code" : {\n          "query" : "L4",\n          "operator" : "OR",\n          "prefix_length" : 0,\n          "max_expansions" : 50,\n          "fuzzy_transpositions" : true,\n          "lenient" : false,\n          "zero_terms_query" : "NONE",\n          "auto_generate_synonyms_phrase_query" : true,\n          "boost" : 1.0\n        }\n      }\n    },\n    "path" : "codeData",\n    "ignore_unmapped" : false,\n    "score_mode" : "avg",\n    "boost" : 1.0\n  }\n}')

Response 4: Fails with: TransportError(400, u'search_phase_execution_exception', u'failed to create query: {\n "nested" : {\n "query" : {\n "match" : {\n "codeData.code" : {\n "query" : "L4",\n "operator" : "OR",\n "prefix_length" : 0,\n "max_expansions" : 50,\n "fuzzy_transpositions" : true,\n "lenient" : false,\n "zero_terms_query" : "NONE",\n "auto_generate_synonyms_phrase_query" : true,\n "boost" : 1.0\n }\n }\n },\n "path" : "codeData",\n "ignore_unmapped" : false,\n "score_mode" : "avg",\n "boost" : 1.0\n }\n}')

Other Resources Examined

ElasticSearch Nested Query Reference

Github Issue on ElasticSearch_DSL py

ElasticSearch_DSL Python Documentation - Although this is useful, there is not a single example of a nested search/query in the documentation.

Upvotes: 3

Views: 2202

Answers (1)

Honza Král
Honza Král

Reputation: 3022

to query a nested field you seem to have the right approach with:

q = Q("match", codeData__code="L4")
s = Search(using=es, index="nest-test6").query("nested", path="codeData", query=q)

Any __ in a kwarg passed to Q will be translated to . internally. Alternatively, you can always rely on python kwarg expansion:

q = Q('match', **{"codeData.code": "L4"})

which should work just as well, your example just had an extra " in there, that's why it was rejected by python.

Upvotes: 6

Related Questions