brauliobo
brauliobo

Reputation: 6315

Group and count by array of objects' keys

Given the following index definition and query:

curl -XDELETE "localhost:9200/products"
curl -XPUT "localhost:9200/products"
curl -XPUT "localhost:9200/products/_mapping" -H 'Content-Type: application/json' -d'
{
  "properties": {
    "opinions": {
      "type": "nested",
      "properties": {
        "topic": {"type": "keyword"},
        "count": {"type": "long"}
      },
      "include_in_parent": true
    }
  }
}'

curl -X POST "localhost:9200/products/_bulk" -H 'Content-Type: application/json' -d'
{"index":{"_id":1}}
{"opinions":[{"topic": "room", "count": 2}, {"topic": "kitchen", "count": 1}]}
{"index":{"_id":2}}
{"opinions":[{"topic": "room", "count": 1}, {"topic": "restroom", "count": 1}]}
'

sleep 2
curl -X POST "localhost:9200/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "size": 0,
  "aggs": {
    "per_topic": {
      "terms": {"field": "opinions.topic"},
      "aggs": {
        "counts": {
          "sum": {"field": "opinions.count"}
        }
      }
    }
  }
}
'

Produces the result:

  "aggregations" : {
    "per_topic" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "room",
          "doc_count" : 2,
          "counts" : {
            "value" : 5.0
          }
        },
        {
          "key" : "kitchen",
          "doc_count" : 1,
          "counts" : {
            "value" : 3.0
          }
        },
        {
          "key" : "restroom",
          "doc_count" : 1,
          "counts" : {
            "value" : 2.0
          }
        }
      ]
    }
  }
}

I'm expecting the sum of room to be 3, kitchen to be 1 and restroom to be 1, counting only the related nested documents, but instead it is summing all the nested count fields in all the matched the documents.

How can I sum only the matched aggregated nested documents?

UPDATE: solution based on comments

curl -X POST "localhost:9200/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "size": 0,
  "aggs": {
    "opinions": {
      "nested": {"path": "opinions"},
      "aggs": {
        "per_topic": {
          "terms": {"field": "opinions.topic"},
          "aggs": {
            "counts": {
              "sum": {"field": "opinions.count"}
            }
          }
        }
      }
    }
  }
}
'

Upvotes: 2

Views: 486

Answers (1)

dcolazin
dcolazin

Reputation: 1072

The main initial problem was the use of object fields instead of nested fields: only using nested fields is it possible to preserve the structure [{"room", 2}, {"kitchen", 1}], as in object fields the data is flattened to {["room", "kitchen"], [1,2]} without relationships between "room" and 2.

Unluckily, at the moment is not possible to use the SQL API to group by (some?) nested fields, but it is possible to write a native Elastic query using nested aggregations.

Upvotes: 2

Related Questions