Rakesh
Rakesh

Reputation: 466

Elastic search querying nested document array

consider that I have some set of documents(in below-mentioned structure) in an elastic search index

{
  "xid": "1234567",
  "time": "12/5/12 5:49 AM",
  "data": [
    {
      "id": "abc",
      "amount": 400
    },
    {
     "id": "def",
      "amount": 200 
    }
  ]
}
{
  "xid": "1234568",
  "time": "13/5/12 7:23 AM",
  "data": [
    {
      "id": "abc",
      "amount": 400
    },
    {
     "id": "ghi",
      "amount": 300 
    }
  ]
}

now inside the data array in every document, I want to group by id and find the sum.

For the given 2 documents the solution would look something like

{
"id" : "abc",
"total" :800
},
{
"id" : "def",
"total" :200
},
{
"id" : "ghi",
"total" :300
}

Please help me in structuring my request query.
My initial approach was

{
  "aggs": {
    "group_by_id": {
      "terms": {
        "field": "data.id.keyword"
      },
      "aggs": {
       "total" : {
         "sum": {
          "field": "data.amount"
        }
       }
      }
    }
  }
}

The above query result is given below which is not the expected result.

{
"id" : "abc",
"total" :1300
},
{
"id" : "def",
"total" :600
},
{
"id" : "ghi",
"total" :700
}

Upvotes: 1

Views: 85

Answers (1)

Josu Goñi
Josu Goñi

Reputation: 1274

You need to use a nested aggregation, and the type of your field data should be declared as nested within your mapping.

Otherwise Elasticsearch will have the following view of your documents:

{
  "xid": "1234567",
  "time": "12/5/12 5:49 AM",
  "data.id": ["abc", "def"],
  "data.amount": [400, 200]
}

{
  "xid": "1234568",
  "time": "13/5/12 7:23 AM",
  "data.id": ["abc", "ghi"],
  "data.amount": [400, 300]
}

Your new mapping for the data field should look like this:

"data": {
  "type": "nested",
  "properties": {
    "id": {
      "type": "keyword"
    },
    "amount": {
      "type": "float"
    }
  }
}

And now you can do the following aggregation:

{
  "size": 0,
  "aggs": {
    "data": {
      "nested": {
        "path": "data"
      },
      "aggs": {
        "group_by_id": {
          "terms": {
            "field": "data.id"
          },
          "aggs": {
            "total": {
              "sum": {
                "field": "data.amount"
              }
            }
          }
        }
      }
    }
  }
}

And this is the result you will get:

"buckets": [
  {
    "key": "abc",
    "doc_count": 2,
    "total": {
      "value": 800
    }
  },
  {
    "key": "def",
    "doc_count": 1,
    "total": {
      "value": 200
    }
  },
  {
    "key": "ghi",
    "doc_count": 1,
    "total": {
      "value": 300
    }
  }
]

Upvotes: 1

Related Questions