Gulyaev
Gulyaev

Reputation: 33

Elasticsearch aggregations for faceted search excluding some fields

I have shop which use elasticsearch 2.4 for faceted search. But at the moment the existing filters (product attributes) are taken from mysql. I want to do this using elasticsearch aggregations. But I got the problem: I do not need to aggregate all the attributes.

What a have:

Part of Mapping:

...
'is_active' => [
    'type' => 'long',
    'index' => 'not_analyzed',
],
'category_id' => [
    'type' => 'long',
    'index' => 'not_analyzed',
],
'attrs' => [
    'properties' => [
        'attr_name' => ['type' => 'string', 'index'     => 'not_analyzed'],
        'value' => [
            'type' => 'string',
            'index' => 'analyzed',
            'analyzer' => 'attrs_analizer',
        ],
    ]
],
...

Exemple of data:

{
    "id": 1,
    "is_active": "1",
    "category_id": 189,
    ...
    "price": "48.00",
    "attrs": [
      {
        "attr_name": "Brand",
        "value": "TP-Link"
      },
      {
        "attr_name": "Model",
        "value": "TL-1"
      },
      {
        "attr_name": "Other",
        "value": "<div>Some text of 'Other' property<br><img src......><ul><li>......</ul></div>"
      }
    ]
  },
  {
    "id": 2,
    "is_active": "1",
    "category_id": 242,
    ...
    "price": "12.00",
    "attrs": [
      {
        "attr_name": "Brand",
        "value": "Lenovo"
      },
      {
        "attr_name": "Model",
        "value": "B570"
      },
      {
        "attr_name": "OS",
        "value": "Linux"
      },
      {
        "attr_name": "Other",
        "value": "<div>Some text of 'Other' property<br><img src......><ul><li>......</ul></div>"
      }
    ]
  },
  {
    "id": 3,
    "is_active": "1",
    "category_id": 242,
    ...
    "price": "24.00",
    "attrs": [
      {
        "attr_name": "Brand",
        "value": "Asus"
      },
      {
        "attr_name": "Model",
        "value": "QZ85"
      },
      {
        "attr_name": "OS",
        "value": "Windows"
      },
      {
        "attr_name": "Other",
        "value": "<div>Some text of 'Other' property<br><img src......><ul><li>......</ul></div>"
      }
    ]
  }

Attributes such as "Model" and "Other" are not used when filtering products, they are only displayed on the product page. On the other attributes (Brand, OS, and others ...) I want to receive aggregations.

When I try to aggregate the attrs.value field, of course I get aggregations for all data (including the large "Other" fields, in which there can be a lot of HTML).

"aggs": {
    "facet_value": {
      "terms": {
        "field": "attrs.value",
        "size": 0
      }
    }
  }

How to exclude "attrs.attr_name": ["Model", "Other"]?

Change the mapping is a bad solution for me, but if it is inevitable, tell me how to do it? I guess I'll need to make "attrs" nested?

UPD:

I want to receive: 1. All the attributes that the products have in a certain category, except for those that I indicate in the settings of the my system (in this example I will exclude "Model" and "Other"). 2. Number of products near each value.

It should look like this:

For category "Laptops":

Brand:

OS:

For "computer monitors":

Brand:

Resolution:

It's Terms Aggregation , I use this for the number of products for each category. And I try it for attrs.value, but I do not know how to exclude "attrs.value", which refer to "attrs.attr_name": "Model" & "attrs.attr_name": "Other".

UPD2:

In my case if map attrs as nested type, the weight of the index increases by 30%. from 2700Mi to 3510Mi. If there is no other option, I'll have to put up with it.

Upvotes: 3

Views: 966

Answers (1)

user3775217
user3775217

Reputation: 4803

you have to map first attrs as nested type and use nested aggregations.

PUT no_play
{
  "mappings": {
    "document_type" : {
      "properties": {
        "is_active" : {
          "type": "long"
        },
        "category_id" : {
          "type": "long"
        },
        "attrs" : {
          "type": "nested", 
          "properties": {
            "attr_name" : {
              "type" : "keyword"
            },
            "value" : {
              "type" : "keyword"
            }
          }
        }
      }
    }
  }
}


POST no_play/document_type
  {
    "id": 3,
    "is_active": "1",
    "category_id": 242,
    "price": "24.00",
    "attrs": [
      {
        "attr_name": "Brand",
        "value": "Asus"
      },
      {
        "attr_name": "Model",
        "value": "QZ85"
      },
      {
        "attr_name": "OS",
        "value": "<div>Some text of 'Other' property<br><img src......><ul><li>......</ul></div>"
      },
      {
        "attr_name": "Other",
        "value": "<div>Some text of 'Other' property<br><img src......><ul><li>......</ul></div>"
      }
    ]
  }

Since you didn't mention how you want to aggregate.

Case 1) If you want to count the attrs as individual. This metric gives you count of term occurrences.

POST no_play/_search
{
  "size": 0,
  "aggs": {
    "nested_aggregation_value": {
      "nested": {
        "path": "attrs"
      },
      "aggs": {
        "value_term": {
          "terms": {
            "field": "attrs.value",
            "size": 10
          }
        }
      }
    }
  }
}

POST no_play/_search
    {
      "size": 0,
      "aggs": {
        "nested_aggregation_value": {
          "nested": {
            "path": "attrs"
          },
          "aggs": {
            "value_term": {
              "terms": {
                "field": "attrs.value",
                "size": 10
              },
              "aggs": {
                "reverse_back_to_roots": {
                  "reverse_nested": {
                  }
                }
              }
            }
          }
        }
      }
    }

Now to get count of root document with attrs value you will need to hook a reverse nested aggregation to move the aggregator a level up to the level of root document.

Think of the following document.

{
    "id": 3,
    "is_active": "1",
    "category_id": 242,
    "price": "24.00",
    "attrs": [
      {
        "attr_name": "Brand",
        "value": "Asus"
      },
      {
        "attr_name": "Model",
        "value": "QZ85"
      },
      {
        "attr_name": "OS",
        "value": "repeated value"
      },
      {
        "attr_name": "Other",
        "value": "repeated value"
      }
    ]
  }

For first query the value count for 'repeated value' will be 2 and for second query it will be 1

Note

here is how you can do filtering to exclude

POST no_play/_search
{
    "size": 0,
    "aggs": {
        "nested_aggregation_value": {
            "nested": {
                "path": "attrs"
            },
            "aggs": {
                "filtered_results": {
                    "filter": {
                        "bool": {
                            "must_not": [{
                                "terms": {
                                    "attrs.attr_name": ["Model", "Brand"]
                                }
                            }]
                        }
                    },
                    "aggs": {
                        "value_term": {
                            "terms": {
                                "field": "attrs.value",
                                "size": 10
                            }
                        }
                    }
                }
            }
        }
    }
}


POST no_play/_search
 {
    "size": 0,
    "aggs": {
        "nested_aggregation_value": {
            "nested": {
                "path": "attrs"
            },
            "aggs": {
                "filtered_results": {
                    "filter": {
                        "bool": {
                            "must_not": [{
                                "terms": {
                                    "attrs.attr_name": ["Model", "Brand"]
                                }
                            }]
                        }
                    },
                    "aggs": {
                        "value_term": {
                            "terms": {
                                "field": "attrs.value",
                                "size": 10
                            },
                            "aggs": {
                                "reverse_back_to_roots": {
                                    "reverse_nested": {}
                                }
                            }
                        }
                    }
                }
            }
        }
    }
 }

Thanks

Upvotes: 1

Related Questions