WoJ
WoJ

Reputation: 30035

How to get the number of hits of several matching fields in one record?

I have records similar to

{
    "who": "John",
    "hobby": [
        {"name": "gardening",
         "skills": 2
         },
        {"name": "sleeping",
         "skills": 3
         },
        {"name": "darts",
         "skills": 2
         }
    ]
}
,
{
    "who": "Mary",
    "hobby": [
        {"name": "gardening",
         "skills": 2
         },
        {"name": "volleyball",
         "skills": 3
         },
        {"name": "kung-fu",
         "skills": 2
         }
    ]
}

I am looking at building a query which would answer the question: "how many hobbies with skills=2 do we have?"

The answer for the example above would be 3 ("gardening" is common to both, and each have another unique one).

Every "query" or "query"+"aggs" I tried returns in ['hits']['hits'] or ['aggregations']['sources']['buckets'] the number of matching documents, that is two in the case above (one for "John" and one for "Mary", each of them satisfying the query).

Is there a way to build a query so that it returns the total number of fields (in the example above: the elements of the list "hobby") which matched that query? (fields, not documents)

Note: If my documents were flat:

{"who": "John", "name": "gardening", "skills": 2},
{"who": "John", "name": "sleeping", "skills": 3},
(...)
{"who": "Mary", "name": "kung-fu", "skills": 2}

then a simple "query" to match "skills": 2 + an aggregation on "name" would have done the work

Upvotes: 0

Views: 194

Answers (1)

Val
Val

Reputation: 217474

Yes, you can achieve this with the nested type and using inner_hits and/or nested aggregations.

So here is the mapping you should use:

curl -XPUT localhost:9200/hobbies -d '{
  "mappings": {
    "hob": {
      "properties": {
        "who": {
          "type": "string"
        },
        "hobby": {
          "type": "nested",      <--- the hobby list is of type nested
          "properties": {
            "name": {
              "type": "string",
              "index": "not_analyzed"
            },
            "skills": {
              "type": "integer"
            }
          }
        }
      }
    }
  }
}

Then we can insert your two sample documents using the _bulk endpoint like this:

curl -XPOST localhost:9200/hobbies/hob/_bulk -d '
{"index":{}}
{"who":"John", "hobby":[{"name": "gardening","skills": 2},{"name": "sleeping","skills": 3},{"name": "darts","skills": 2}]}
{"index":{}}
{"who":"Mary", "hobby":[{"name": "gardening","skills": 2},{"name": "volley-ball","skills": 3},{"name": "kung-fu","skills": 2}]}
'

And finally, we can query your index for how many hobbies have skills: 2 like this:

curl -XPOST localhost:9200/hobbies/hob/_search -d '{
  "_source": false,
  "query": {
    "nested": {
      "path": "hobby",
      "query": {
        "term": {
          "hobby.skills": 2
        }
      },
      "inner_hits": {}         <---- this will return only the matching nested fields with skills=2
    }
  },
  "aggs": {
    "hobbies": {
      "nested": {
        "path": "hobby"
      },
      "aggs": {
        "skills": {
          "filter": {
            "term": {
              "hobby.skills": 2
            }
          },
          "aggs": {
            "by_field": {         <--- this will return a breakdown of the fields with skills=2
              "terms": {
                "field": "name"
              }
            }
          }
        }
      }
    }
  }
}'

What this query will return you is

  1. In the hits part, the four fields that have skills: 2
  2. In the aggs part, a breakdown of the 3 distinct fields which have skills: 2

Upvotes: 1

Related Questions