Furia 824
Furia 824

Reputation: 11

How to count number of fields inside nested field? - Elasticsearch

I did the following mapping. I would like to count the number of products in each nested field "products" (for each document separately). I would also like to do a histogram aggregation, so that I would know the number of specific bucket sizes.

PUT /receipts
{
  "mappings": {
    "properties": {
      "id" : {
        "type": "integer"
      },
      "user_id" : {
        "type": "integer"
      },
      "date" : {
        "type": "date"
      },
      "sum" : {
        "type": "double"
      },
    "products" : {
      "type": "nested",
      "properties": {
        "name" : {
          "type" : "text"
        },
        "number" : {
          "type" : "double"
        },
        "price_single" : {
          "type" : "double"
        },
        "price_total" : {
          "type" : "double"
        }
      }
    }
}
}
}

I've tried this query, but I get the number of all the products instead of number of products for each document separately.

GET /receipts/_search
{
  "query": {
    "match_all": {}
  },
  "size": 0,
  "aggs": {
    "terms": {
      "nested": {
        "path": "products"
      },
      "aggs": {
        "bucket_size": {
          "value_count": {
            "field": "products"
          }
        }
      }
    }
  }
}

Result of the query:

"aggregations" : {
    "terms" : {
      "doc_count" : 6552,
      "bucket_size" : {
        "value" : 0
      }
    }
  }

UPDATE

Now I have this code where I make separate buckets for each id and count the number of products inside them.

GET /receipts/_search
{
  "query": {
    "match_all": {}
  },
  "size" : 0,
  "aggs": {
    "terms":{
      "terms":{
        "field": "_id"
      },
      "aggs": {
        "nested": {
          "nested": {
            "path": "products"
          },
          "aggs": {
            "bucket_size": {
              "value_count": {
                "field": "products.number"
              }
            }
          }
        }
      }
    }
  }
}

Result of the query:

"aggregations" : {
    "terms" : {
      "doc_count_error_upper_bound" : 5,
      "sum_other_doc_count" : 490,
      "buckets" : [
        {
          "key" : "1",
          "doc_count" : 1,
          "nested" : {
            "doc_count" : 21,
            "bucket_size" : {
              "value" : 21
            }
          }
        },
        {
          "key" : "10",
          "doc_count" : 1,
          "nested" : {
            "doc_count" : 5,
            "bucket_size" : {
              "value" : 5
            }
          }
        },
        {
          "key" : "100",
          "doc_count" : 1,
          "nested" : {
            "doc_count" : 12,
            "bucket_size" : {
              "value" : 12
            }
          }
        },
...

Is is possible to group these values (21, 5, 12, ...) into buckets to make a histogram of them?

Upvotes: 0

Views: 2085

Answers (1)

Joe - Check out my books
Joe - Check out my books

Reputation: 16943

products is only the path to the array of individual products, not an aggregatable field. So you'll need to use it on one of your product's field -- such as the number:

GET receipts/_search
{
  "size": 0,
  "aggs": {
    "terms": {
      "nested": {
        "path": "products"
      },
      "aggs": {
        "bucket_size": {
          "value_count": {
            "field": "products.number"
          }
        }
      }
    }
  }
}

Note that is a product has no number, it'll not contribute to the total count. It's therefore best practice to always include an ID in each of them and then aggregate on that field.

Alternatively you could use a script to account for missing values. Luckily value_count does not deduplicate -- meaning if two products are alike and/or have empty values, they'll still be counted as two:

GET receipts/_search
{
  "size": 0,
  "aggs": {
    "terms": {
      "nested": {
        "path": "products"
      },
      "aggs": {
        "bucket_size": {
          "value_count": {
            "script": {
              "source": "doc['products.number'].toString()"
            }
          }
        }
      }
    }
  }
}

UPDATE

You could also use a nested composite aggregation which'll give you the histogrammed product count w/ the corresponding receipt id:

GET /receipts/_search
{
  "size": 0,
  "aggs": {
    "my_aggs": {
      "nested": {
        "path": "products"
      },
      "aggs": {
        "composite_parent": {
          "composite": {
            "sources": [
              {
                "receipt_id": {
                  "terms": {
                    "field": "_id"
                  }
                }
              },
              {
                "product_number": {
                  "histogram": {
                    "field": "products.number",
                    "interval": 1
                  }
                }
              }
            ]
          }
        }
      }
    }
  }
}

The interval is modifiable.

Upvotes: 1

Related Questions