Mal Curtis
Mal Curtis

Reputation: 695

Return key value from alternate field in aggregation

Is it possible to get Elastic Search to return an aggregate key based on a different document field?

We're placing both a foreign id, and the foreign name in our type, then aggregating over the id, but would like to get the name returned. Names are not unique, so they are not appropriate to aggregate over. I'm aware that they're also not necessarily unique over the record set, but would accept a name sampled from a single record of the set.

For example, say our data is regarding sales of a product. Each sale has the product id and product name associated with it.

// Sales
{ "product_id": 1, "product_name": "Beer", "quantity": 3, … }
{ "product_id": 1, "product_name": "Beer", "quantity": 2, … }
{ "product_id": 2, "product_name": "Wine", "quantity": 6, … }

Query:

"aggregations": {
    "product": {
      "terms": {
        "field": "product_id"
      },
      "aggregations": {
        "day": {
          "count": {
            "value_count": {
              "field": "quantity"
            }
          }
        }
      }
    }
  }
}

Result:

…
"aggregations": {
  "product": {
    "buckets": [
    {
      "key": "1",
      "doc_count": 2,
      "count": {
        "value": 5
      }
    },{
      "key": "2",
      "doc_count": 1,
      "count": {
        "value": 6
    }
    ]
  }
}
…

Wanted Result:

…
"aggregations": {
  "product": {
    "buckets": [
    {
      "key": "Beer",
      "doc_count": 2,
      "count": {
        "value": 5
      }
    },{
      "key": "Wine",
      "doc_count": 1,
      "count": {
        "value": 6
    }
    ]
  }
}
…

After reading the docs on scripts, I don't think this is possible, since it evaluates on the value only, and doesn't seem to have access to the entire doc (since there isn't a doc, but a set of docs).

Upvotes: 13

Views: 8154

Answers (4)

Hoàng Long
Hoàng Long

Reputation: 10848

From my understanding, the question wants to return product_name along with the aggregation result by product_id.

That problem could be solved by a topHit sub-aggregation:

{
    "aggregations": {
        "product": {
            "terms": {
                "field": "product_id"
            },
            "aggregations": {
                "day": {
                    "count": {
                        "value_count": {
                            "field": "quantity"
                        }
                    }
                },
                "topHits": {
                    "top_hits": {
                        "from": 0,
                        "size": 1,
                        "_source": false,
                        "docvalue_fields": [
                            {
                                "field": "product_name"
                            }
                        ]
                    }
                }
            }
        }
    }
}

Then the result would look like (removed the unnecessary fields):

{
    "aggregations": {
        "product": {
            "buckets": [
                {
                    "key": "1",
                    "doc_count": 2,
                    "topHits": {
                        "hits": {
                            "total": {
                                "value": 2,
                                "relation": "eq"
                            },
                            "hits": [
                                {
                                    "fields": {
                                        "product_name": [
                                            "Beer"
                                        ]
                                    }
                                }
                            ]
                        }
                    },
                    "count": {
                        "value": 5
                    }
                }
            ]
        }
    }
}

Notice that count & product_name is in the same bucket. By that way we can link product_name with count.

Note: If product_name is analyzed / normalized, then the returned result would also be normalized. To avoid that, we can create a sub-field and normalize that field instead.

Upvotes: 0

rjarmstrong
rjarmstrong

Reputation: 1231

You can do it with scripts if you use only the script attribute alone (this then has access to the entire doc). Then split it in your client: e.g.

"aggs": {
    "types_of": {
      "terms": {
        "script": "doc['product_name'].value + '|' + doc['product_id'].value"
      }
    }
  }

Upvotes: 5

lukens
lukens

Reputation: 490

You could use a child aggregation to get the name, so your query would be something like:

"aggregations": {
    "product": {
      "terms": {
        "field": "product_id"
      },
      "aggregations": {
        "name": {
          "terms": {
            "field": "product_name"
          }
        },
        "day": {
          "count": {
            "value_count": {
              "field": "quantity"
            }
          }
        }
      }
    }
  }
}

Upvotes: 0

DeH
DeH

Reputation: 571

If you have all powers over your indexation process, I'd suggest just adding a new field on your own (not analyzed), based on the product_id, and aggregating on that field instead.

I don't think (but I may be mistaken) that what you want to do is possible.

Upvotes: 0

Related Questions