Aggregations in Elasticsearch cutting string instead of taking everything

Question

Having the following simple mapping:

curl -XPUT localhost:9200/transaciones/ -d '{
    "mappings": {
        "ventas": {
            "properties": {
                "tipo": { "type": "string" },
                "cantidad": { "type": "double" }
            }
        }
    }
}'

Adding data:

curl -XPUT localhost:9200/transaciones/ventas/1 -d '{
    "tipo": "Ingreso bancario",
    "cantidad": 80
}'

curl -XPUT localhost:9200/transaciones/ventas/2 -d '{
    "tipo": "Ingreso bancario",
    "cantidad": 10
}'

curl -XPUT localhost:9200/transaciones/ventas/3 -d '{
    "tipo": "PayPal",
    "cantidad": 30
}'

curl -XPUT localhost:9200/transaciones/ventas/4 -d '{
    "tipo": "Tarjeta de credito",
    "cantidad": 130
}'

curl -XPUT localhost:9200/transaciones/ventas/5 -d '{
    "tipo": "Tarjeta de credito",
    "cantidad": 130
}'

When I try to get the aggs with:

curl -XGET localhost:9200/transaciones/ventas/_search?pretty=true -d '{
    "size": 0,
    "aggs": {
        "tipos_de_venta": {
            "terms": {
                "field": "tipo"
            }
        }
    }
}'

The response is:

  "took" : 15,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 5,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "tipos_de_venta" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [ {
        "key" : "bancario",
        "doc_count" : 2
      }, {
        "key" : "credito",
        "doc_count" : 2
      }, {
        "key" : "de",
        "doc_count" : 2
      }, {
        "key" : "ingreso",
        "doc_count" : 2
      }, {
        "key" : "tarjeta",
        "doc_count" : 2
      }, {
        "key" : "paypal",
        "doc_count" : 1
      } ]
    }
  }
}

As you can see it cuts the strings Tarjeta de credito into Tarjeta, de, credit. How can I take the entire string without using on the mapping not_analyzed on tipo? My desired output would be Ingreso bancario, PayPal and Tarjeta de crédito, on the response would be something like this:

 "aggregations" : {
    "tipos_de_venta" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [ {
        "key" : "Ingreso bancario",
        "doc_count" : 2
      }, {
        "key" : "PayPal",
        "doc_count" : 1
      }, {
        "key" : "Tarjeta de credito",
        "doc_count" : 2
      } ]
    }
  }

PS: I'm using ES 2.3.2

Val · Accepted Answer

It's because your tipo field is an analyzed string. The right way to do this is to create a not_analyzed field in order to achieve what you want:

curl -XPUT localhost:9200/transaciones/_mapping/ventas -d '{
    "properties": {
        "tipo": { 
           "type": "string",
           "fields": {
               "raw": {
                   "type": "string",
                   "index": "not_analyzed"
               }
           }
        }
    }
}'

Then you need to reindex your documents and finally you'll be able to run this and get the desired results:

curl -XGET localhost:9200/transaciones/ventas/_search?pretty=true -d '{
    "size": 0,
    "aggs": {
        "tipos_de_venta": {
            "terms": {
                "field": "tipo.raw"
            }
        }
    }
}'

UPDATE

If you really don't want to create a not_analyzed field, then you have another way using a script terms aggregation but it can really kill the performance of your cluster

curl -XGET localhost:9200/transaciones/ventas/_search?pretty=true -d '{
    "size": 0,
    "aggs": {
        "tipos_de_venta": {
            "terms": {
                "script": _source.tipo"
            }
        }
    }
}'

Aggregations in Elasticsearch cutting string instead of taking everything

Answers (1)

Related Questions