kopi_b
kopi_b

Reputation: 1483

How to query all fields of a document with nested fields in OpenSearch (or Elasticsearch)?

I'm currently using version 2.17.1 and referring to this official documentation. OpenSearch - Query DSL - Joining Queries - Nested query - Multi-level nested queries.

Using this, I'm able to query for results where all terms must match in a single nested item.

But after introducing the nested fields, I'm no longer able to just search in every single field in the whole document with a simple string query like "(John) AND (Jane)".

Is there a way to query in a list of items where all terms must match in a single item to return the parent where necessary, but still be able to query for the terms all over the document?

--------- Update: Long version -----------

Still referring to the official documentation found here: OpenSearch - Query DSL - Joining Queries - Nested query - Multi-level nested queries as this matches one part of my requirements.

Let's say I've a document like this one:

  "patient": {
    "name": "John Doe",
    "contacts": [
      {
        "name": "Jane Doe",
        "relationship": "mother",
        "phone": "5551111"
      },
      {
        "name": "Joe Doe",
        "relationship": "father",
        "phone": "5552222"
      }
    ]
  }

When I search for 'jane' and 'father' I don't expect to find this document, because although both terms are present, they are in different items in the contracts list. I only expect to find it when the terms match within one item, like when searching for 'jane' and 'mother'.

As far as I understand it, to be able to query for patients where my search terms must match in a single 'contact', I've to declare them as type 'nested' like:

  "mappings": {
    "properties": {
      "patient": {
        "type": "nested",
        "properties": {
          "name": {
            "type": "text"
          },
          "contacts": {
            "type": "nested",
            "properties": {
              "name": {
                "type": "text"
              },
              "relationship": {
                "type": "text"
              },
              "phone": {
                "type": "keyword"
              }
            }
          }
        }
      }
    }
  }

Using this mapping and a nested query like:

  "query": {
    "nested": {
      "path": "patient",
      "query": {
        "nested": {
          "path": "patient.contacts",
          "query": {
            "bool": {
              "must": [
                { "match": { "patient.contacts.relationship": "mother" } },
                { "match": { "patient.contacts.name": "Jane" } }
              ]
            }
          }
        }
      }
    }
  }

I get exactly that behavior.

But I lose the possibility to just query over the whole document with a string query like:

   "query": {
     "query_string": {
       "query": "(John) AND (5552222)"
     }
   }
/* 'John' as property of root object of type patient and
   '5552222' as property of one of the contact's properties */

Before I introduced the nested field type, such a query would find the document just fine which is the second part of the requirement because sometimes I know which term belongs to which field and can go for the nested query, but other times I don't and therefor can't use them and want to fallback to a search over all fields of the document.

Is this possible? Is maybe the nested type the wrong approach? Is using the nested type correct but I need something other than a string query?

Thank you, kopi

Upvotes: 1

Views: 226

Answers (1)

Musab Dogan
Musab Dogan

Reputation: 3680

To search multiple fields at once, especially with a single query, it'd be better to normalize the data by keeping them on the same level. To do that, you can enrich the data. Here is my suggestion:

  1. Keep one doc per patient and save it as text field.
  2. Have multiple contact per patient as nested field to secure the relation in each sub-fields.

PUT _ingest/pipeline/add_patient_name
{
  "processors": [
    {
      "set": {
        "field": "_temp_patient_name",
        "value": "{{patient.name}}"
      }
    },
    {
      "foreach": {
        "field": "patient.contacts",
        "processor": {
          "set": {
            "field": "_ingest._value.patient_name",
            "value": "{{_temp_patient_name}}"
          }
        }
      }
    },
    {
      "remove": {
        "field": "_temp_patient_name"
      }
    }
  ]
}

PUT hospital
{
  "mappings": {
    "properties": {
      "patient": {
        "properties": {
          "name": {
            "type": "text"
          },
          "contacts": {
            "type": "nested",
            "properties": {
              "name": {
                "type": "text"
              },
              "relationship": {
                "type": "text"
              },
              "phone": {
                "type": "keyword"
              }
            }
          }
        }
      }
    }
  },
  "settings": {
    "index.default_pipeline": "add_patient_name"
  }
}

PUT hospital/_doc/1
{
  "patient": {
    "name": "John Doe",
    "contacts": [
      {
        "name": "Jane Doe",
        "relationship": "mother",
        "phone": "5551111"
      },
      {
        "name": "Joe Doe",
        "relationship": "father",
        "phone": "5552222"
      }
    ]
  }
}

GET hospital/_search
{
  "query": {
    "nested": {
      "path": "patient.contacts",
      "query": {
        "query_string": {
          "default_field": "*",
          "query": "John AND 5552222"
        }
      }
    }
  }
}

ingest pipeline

Upvotes: 0

Related Questions