F21
F21

Reputation: 33441

Treat child as field of parent in elastic search query

I am reading the docs for elasticsearch and this [page][1] talks about mapping a child to a parent type using _parent.

If I have childs called email attached to parents called account:

Fields in each type:

account (http://localhost:9200/myapp/account/1)
========
id
name
some_other_info
state

email (http://localhost:9200/myapp/email/1?parent=1)
========
id
email

After trying imotov's suggestion, I came up with this query:

This is executed on http://localhost:9200/myapp/account/_search

{
  "query": {
    "bool": {
      "must": [
        {
          "prefix": {
            "name": "a"
          }
        },
        {
          "term": {
            "statuses": "active"
          }
        }
      ],
      "should": [
        {
          "has_child": {
            "type": "emailaddress",
            "query": {
              "prefix": {
                "email": "a"
              }
            }
          }
        }
      ]
    }
  }
}

The problem is that the above does not give me any accounts where the email matches.

The effect I want is essentially this:

So, I basically need to be able to OR the search between 2 types and return the parent type of matches.


Test data:

curl -XPUT http://localhost:9200/test/account/1 -d '{
    "name": "John Smith",
    "statuses": "active"
}'

curl -XPUT http://localhost:9200/test/account/2 -d '{
    "name": "Peter Smith",
    "statuses": "active"
}'

curl -XPUT http://localhost:9200/test/account/3 -d '{
    "name": "Andy Smith",
    "statuses": "active"
}'

//Set up mapping for parent/child relationship

curl -XPUT 'http://localhost:9200/test/email/_mapping' -d '{
    "emails" : {
        "_parent" : {"type" : "account"}
    }
}'

curl -XPUT http://localhost:9200/test/email/1?parent=1 -d '{
    "email": "[email protected]"
}'

curl -XPUT http://localhost:9200/test/email/2?parent=1 -d '{
    "email": "[email protected]"
}'

curl -XPUT http://localhost:9200/test/email/3?parent=1 -d '{
    "email": "[email protected]"
}'

curl -XPUT http://localhost:9200/test/email/4?parent=2 -d '{
    "email": "[email protected]"
}'

curl -XPUT http://localhost:9200/test/email/5?parent=3 -d '{
    "email": "[email protected]"
}'

curl -XPUT http://localhost:9200/test/email/6?parent=3 -d '{
    "email": "[email protected]"
}'

imotov's solution worked for me. Another solution I have found is to query accounts for status = active, then run a bool filter on the result and use has_child on the child type and prefix on name inside the bool filter.

Upvotes: 21

Views: 17968

Answers (1)

imotov
imotov

Reputation: 30163

An important difference between elasticsearch and relational databases is that elasticsearch cannot perform joins. In elasticsearch you are always searching a single index or union of indices. But in case of parent/child relationship, it's possible to limit results in the parent index using a query on the child index. For example, you can execute this query on the account type.

{
    "bool": {
        "must": [
            { 
                "text" : { "name": "foo" } 
            }, { 
                "term" : { "state": "active" } 
            }, {
                "has_child": {
                    "type": "email",
                    "query": {
                        "text": {"email": "bar" }
                    }
                }
            }
        ]
    }
}

This query will return you the parent document only (no child documents will be returned). You can use the parent id returned by this query to find all children of this parent using the field _parent, which is stored and indexed by default.

{
    "term" : { "_parent": "1" } 
}

Or you can limit your results only to the children that contain the word bar in the field email:

{
    "bool": {
        "must": [
            { 
                "term" : { "_parent": "1" } 
            }, { 
                "text" : { "email": "bar" } 
            }
        ]
    }
}

I don't think it's possible to specify parent in the json unless you are using _bulk indexing.

This is how email lookup can be implemented using test data provided in the question:

#!/bin/sh
curl -XDELETE 'http://localhost:9200/test' && echo 
curl -XPOST 'http://localhost:9200/test' -d '{
    "settings" : {
        "number_of_shards" : 1,
        "number_of_replicas" : 0
    },
    "mappings" : {
      "account" : {
        "_source" : { "enabled" : true },
        "properties" : {
          "name": { "type": "string", "analyzer": "standard" },
          "statuses": { "type": "string",  "index": "not_analyzed" }
        }
      },
      "email" : {
        "_parent" : {
          "type" : "account"
        },
        "properties" : {
          "email": { "type": "string",  "analyzer": "standard" }
        }
      }
    }
}' && echo

curl -XPUT 'http://localhost:9200/test/account/1' -d '{
    "name": "John Smith",
    "statuses": "active"
}'

curl -XPUT 'http://localhost:9200/test/account/2' -d '{
    "name": "Peter Smith",
    "statuses": "active"
}'

curl -XPUT 'http://localhost:9200/test/account/3' -d '{
    "name": "Andy Smith",
    "statuses": "active"
}'

//Set up mapping for parent/child relationship

curl -XPUT 'http://localhost:9200/test/email/1?parent=1' -d '{
    "email": "[email protected]"
}'

curl -XPUT 'http://localhost:9200/test/email/2?parent=1' -d '{
    "email": "[email protected]"
}'

curl -XPUT 'http://localhost:9200/test/email/3?parent=1' -d '{
    "email": "[email protected]"
}'

curl -XPUT 'http://localhost:9200/test/email/4?parent=2' -d '{
    "email": "[email protected]"
}'

curl -XPUT 'http://localhost:9200/test/email/5?parent=3' -d '{
    "email": "[email protected]"
}'

curl -XPUT 'http://localhost:9200/test/email/6?parent=3' -d '{
    "email": "[email protected]"
}'

curl -XPOST 'http://localhost:9200/test/_refresh'
echo
curl 'http://localhost:9200/test/account/_search' -d '{
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "statuses": "active"
          }
        }
      ],
      "should": [
        {
          "prefix": {
            "name": "a"
          }
        },
        {
          "has_child": {
            "type": "email",
            "query": {
              "prefix": {
                "email": "a"
              }
            }
          }
        }
      ],
      "minimum_number_should_match" : 1
    }
  }
}' && echo

Upvotes: 24

Related Questions