elasticsearch splits by space in facets

Question

I am trying to do a simple facet request over a field containing more than one word (Eg: 'Name1 Name2', sometimes with dots and commas inside) but what I get is...

 "terms" : [{
    "term" : "Name1",
    "count" : 15
},
{
    "term" : "Name2",
    "count" : 15
}]

so my field value is split by spaces and then runs the facet request...

Query example:

curl -XGET http://my_server:9200/idx_occurrence/Occurrence/_search?pretty=true -d '{
  "query": {
    "query_string": {
      "fields": [
        "dataset"
      ],
      "query": "2",
      "default_operator": "AND"
    }
  },
  "facets": {
    "test": {
      "terms": {
        "field": [
          "speciesName"
        ],
        "size": 50000
      }
    }
  }
}'

imotov · Accepted Answer

First of all, javanna provided a very good answer from a practical perspective. However, for the sake of completeness, I want to mention that in some cases there is a way to do it without reindexing the data.

If the speciesName field is stored and your queries produce relatively small number of results, you can use script_field to retrieve stored field values:

curl -XGET http://my_server:9200/idx_occurrence/Occurrence/_search?pretty=true -d '{
  "query": {
    "query_string": {
      "fields": ["dataset"],
      "query": "2",
      "default_operator": "AND"
    }
  },
  "facets": {
    "test": {
      "terms": {
        "script_field": "_fields['\''speciesName'\''].value",
        "size": 50000
      }
    }
  }
}
'

As a result of this query, elasticsearch will retrieve the speciesName field for every record in your result set and it will construct facets from these values. Needless to say, if your result set contains millions of records, performance of this query might be sluggish.

Similarly, if the field is not stored, but record source is stored, you can use script_field facet to retrieve field values from the source:

......
"script_field": "_source['\''speciesName'\'']",
......

Again, source for each record in the result list will be retrieved and parsed, so you might need some patience to run this query on a large set of records.

elasticsearch splits by space in facets

Answers (2)

Related Questions