knpwrs
knpwrs

Reputation: 16456

Terms Aggregation based on Distinct Terms per Collection

I have documents which look like this:

{
  "foo": null,
  "bars": [
    {
      "baz": "BAZ",
      "qux": null,
      "bears": [
        {
          "fruit": "banana"
        }
      ]
    }
  ]
}

I want buckets of the fruit term with counts of how many documents under bars contain bears with each given fruit. For instance, given the following documents:

{
  "foo": null,
  "bars": [
    {
      "baz": "BAZ",
      "qux": null,
      "bears": [
        {
          "fruit": "banana"
        },
        {
          "fruit": "banana"
        },
        {
          "fruit": "apple"
        }
      ]
    },
    {
      "baz": "BAZ",
      "qux": null,
      "bears": [
        {
          "fruit": "banana"
        }
      ]
    }
  ]
}
{
  "foo": null,
  "bars": [
    {
      "baz": "BAZ",
      "qux": null,
      "bears": [
        {
          "fruit": "apple"
        },
        {
          "fruit": "apple"
        },
        {
          "fruit": "orange"
        }
      ]
    }
  ]
}

I would like results that look like this:

"buckets": [
  {
    "key": "banana",
    "doc_count": 2
  },
  {
    "key": "apple",
    "doc_count": 2
  },
  {
    "key": "orange",
    "doc_count": 1
  }
]

That is, banana appears as the descendent of 2 distinct bars, apple appears as the descendent of 2 distinct bars, and orange appears as the descendent of 1 distinct bar.

Right now I have the following aggregation, which counts the total fruits:

{
  "aggs": {
    "global": {
      "global": {},
      "aggs": {
        "bars": {
          "nested": {
            "path": "bars"
          },
          "aggs": {
            "bears": {
              "nested": {
                "path": "bars.bears"
              },
              "aggs": {
                "fruits": {
                  "terms": {
                    "field": "bars.bears.fruit"
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

This results in the following:

"buckets": [
  {
    "key": "banana",
    "doc_count": 3
  },
  {
    "key": "apple",
    "doc_count": 3
  },
  {
    "key": "orange",
    "doc_count": 1
  }
]

Which is not what I'm looking for. Is it possible to modify this query to count the distinct bars which contain each fruit?

Upvotes: 0

Views: 36

Answers (2)

knpwrs
knpwrs

Reputation: 16456

I actually managed to get the results I was looking, although in a slightly different shape:

Query

{
  "aggs": {
    "global": {
      "global": {},
      "aggs": {
        "bars": {
          "nested": {
            "path": "bars"
          },
          "aggs": {
            "bears": {
              "nested": {
                "path": "bars.bears"
              },
              "aggs": {
                "fruits": {
                  "terms": {
                    "field": "bars.bears.fruit"
                  },
                  "fruit_to_bears": {
                    "reverse_nested": {}
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

Results

"buckets": [
  {
    "key": "banana",
    "doc_count": 3,
    "fruit_to_bears": {
      "doc_count": 2
    }
  },
  {
    "key": "apple",
    "doc_count": 3,
    "fruit_to_bears": {
      "doc_count": 2
    }
  },
  {
    "key": "orange",
    "doc_count": 1,
    "fruit_to_bears": {
      "doc_count": 1
    }
  }
]

Upvotes: 0

Bhavya
Bhavya

Reputation: 16172

Adding a working example with index data (taken same as shown in question), mapping, search query, and search result

Index Mapping:

{
  "mappings": {
    "properties": {
      "bars": {
        "type": "nested",
        "properties": {
          "bears": {
            "properties": {
              "fruit": {
                "type": "keyword"
              }
            }
          }
        }
      }
    }
  }
}

Search Query:

{
  "size": 0,
  "aggs": {
    "bars": {
      "nested": {
        "path": "bars"
      },
      "aggs": {
        "fruits": {
          "terms": {
            "field": "bars.bears.fruit"
          }
        }
      }
    }
  }
}

Search Result:

"aggregations": {
    "bars": {
      "doc_count": 3,
      "fruits": {
        "doc_count_error_upper_bound": 0,
        "sum_other_doc_count": 0,
        "buckets": [
          {
            "key": "apple",
            "doc_count": 2
          },
          {
            "key": "banana",
            "doc_count": 2
          },
          {
            "key": "orange",
            "doc_count": 1
          }
        ]
      }
    }
  }

Upvotes: 1

Related Questions