user1249791
user1249791

Reputation: 1271

elasticsearch splits by space in facets

I am trying to do a simple facet request over a field containing more than one word (Eg: 'Name1 Name2', sometimes with dots and commas inside) but what I get is...

 "terms" : [{
    "term" : "Name1",
    "count" : 15
},
{
    "term" : "Name2",
    "count" : 15
}]

so my field value is split by spaces and then runs the facet request...

Query example:

curl -XGET http://my_server:9200/idx_occurrence/Occurrence/_search?pretty=true -d '{
  "query": {
    "query_string": {
      "fields": [
        "dataset"
      ],
      "query": "2",
      "default_operator": "AND"
    }
  },
  "facets": {
    "test": {
      "terms": {
        "field": [
          "speciesName"
        ],
        "size": 50000
      }
    }
  }
}'

Upvotes: 1

Views: 2323

Answers (2)

javanna
javanna

Reputation: 60245

Your field shouldn't be analyzed, or at least not tokenized. You need to update your mapping and then reindex if you want to index the field without tokenizing it.

Upvotes: 6

imotov
imotov

Reputation: 30163

First of all, javanna provided a very good answer from a practical perspective. However, for the sake of completeness, I want to mention that in some cases there is a way to do it without reindexing the data.

If the speciesName field is stored and your queries produce relatively small number of results, you can use script_field to retrieve stored field values:

curl -XGET http://my_server:9200/idx_occurrence/Occurrence/_search?pretty=true -d '{
  "query": {
    "query_string": {
      "fields": ["dataset"],
      "query": "2",
      "default_operator": "AND"
    }
  },
  "facets": {
    "test": {
      "terms": {
        "script_field": "_fields['\''speciesName'\''].value",
        "size": 50000
      }
    }
  }
}
'

As a result of this query, elasticsearch will retrieve the speciesName field for every record in your result set and it will construct facets from these values. Needless to say, if your result set contains millions of records, performance of this query might be sluggish.

Similarly, if the field is not stored, but record source is stored, you can use script_field facet to retrieve field values from the source:

......
"script_field": "_source['\''speciesName'\'']",
......

Again, source for each record in the result list will be retrieved and parsed, so you might need some patience to run this query on a large set of records.

Upvotes: 4

Related Questions