anoop chandran
anoop chandran

Reputation: 1490

How to do aggregation on nested objects - Elasticsearch

I'm pretty new to Elasticsearch so please bear with me. This is part of my document in ES.

{
  "source": {
    "detail": {
      "attribute": {
        "Size": ["32 Gb",4],
        "Type": ["Tools",4],
        "Brand": ["Sandisk",4],
        "Color": ["Black",4],
        "Model": ["Sdcz36-032g-b35",4],
        "Manufacturer": ["Sandisk",4]
      }
    },
    "title": {
      "list": [
        "Sandisk Cruzer 32gb Usb 32 Gb Flash Drive , Black - Sdcz36-032g"
      ]
    }
  }
}

So what I want to achieve is to find the best three or top three hits of the attribute object. For example, if I do a search for "sandisk", I want to get three attributes like ["Size", "Color", "Model"] or whatever attributes based on the top hits aggregation. So i did a query like this

{
  "size": 0,
  "aggs": {
    "categoryList": {
      "filter": {
        "bool": {
          "filter": [
            {
              "term": {
                "title.list": "sandisk"
              }
            }
          ]
        }
      },
      "aggs": {
        "results": {
          "terms": {
            "field": "detail.attribute",
            "size": 3
          }
        }
      }
    }
  }
}

But it seems to be not working. How do I fix this? Any hints would be much appreciated.

This is the _mappings. It is not the complete one, but I guess this would suffice.

{
  "catalog2_0": {
    "mappings": {
      "product": {
        "dynamic": "strict",
        "dynamic_templates": [
          {
            "attributes": {
              "path_match": "detail.attribute.*",
              "mapping": {
                "type": "text"
              }
            }
          }
        ],
        "properties": {

          "detail": {
            "properties": {
              "attMaxScore": {
                "type": "scaled_float",
                "scaling_factor": 100
              },
              "attribute": {
                "dynamic": "true",
                "properties": {
                  "Brand": {
                    "type": "text"
                  },
                  "Color": {
                    "type": "text"
                  },
                  "MPN": {
                    "type": "text"
                  },
                  "Manufacturer": {
                    "type": "text"
                  },
                  "Model": {
                    "type": "text"
                  },
                  "Operating System": {
                    "type": "text"
                  },
                  "Size": {
                    "type": "text"
                  },
                  "Type": {
                    "type": "text"
                  }
                }
              },
              "description": {
                "type": "text"
              },
              "feature": {
                "type": "text"
              },
              "tag": {
                "type": "text",
                "fields": {
                  "raw": {
                    "type": "keyword"
                  }
                }
              }
            }
          },

          "title": {
            "properties": {

              "en": {
                "type": "text"
              }
            }
          }
        }
      }
    }
  }
}

Upvotes: 2

Views: 5843

Answers (2)

James-Jesse Drinkard
James-Jesse Drinkard

Reputation: 15703

Here is what I have used for a nested aggs query, minus the actual value names. The actual field is a keyword, which as already mentioned is required, that is part of a nested JSON object:

"STATUS_ID": {
                "type": "keyword",
                "index": "not_analyzed",
                "doc_values": true
              },

Query

  GET index name/_search?size=200
    {
      "aggs": {
        "panels": {
          "nested": {
            "path": "nested path"
          },
          "aggs": {
            "statusCodes": {
              "terms": {
                "field": "nested path.STATUS.STATUS_ID",
                "size": 50
              }
            }
          }
        }
      }
    }

Result

"aggregations": {
    "status": {
      "doc_count": 12108963,
      "statusCodes": {
        "doc_count_error_upper_bound": 0,
        "sum_other_doc_count": 0,
        "buckets": [
          {
            "key": "O",
            "doc_count": 5912218
          },
          {
            "key": "C",
            "doc_count": 401586
          },
          {
            "key": "E",
            "doc_count": 135628
          },
          {
            "key": "Y",
            "doc_count": 3742
          },
          {
            "key": "N",
            "doc_count": 1012
          },
          {
            "key": "L",
            "doc_count": 719
          },
          {
            "key": "R",
            "doc_count": 243
          },
          {
            "key": "H",
            "doc_count": 86
          }
        ]
      }
    }

Upvotes: 1

Lupanoide
Lupanoide

Reputation: 3212

  • According the documentation you can't make aggregation on field that have text datatype. They must have keyword datatype.

  • Then you can't make aggregation on the detail.attribute field in that way: The detail.attribute field doesn't store any value: it is an object datatype - not a nested one as you have written in the question, that means that it is a container for other field like Size, Brand etc. So you should aggregate against detail.attribute.Size field - if this one was a keyword datatype - for example.

  • Another presumable error is that you are trying to run a term query on a text datatype - what is the datatype of title.list field?. Term query is a prerogative for field that have keyword datatype, while match query is used to query against text datatype

Upvotes: 2

Related Questions