sam
sam

Reputation: 3511

Elasticsearch, filter on full-text string

I am just starting to use Elasticsearch and I have to work with data generated by a colleague. I noticed that every String data is a full-text value :

{
    "countryId": {
      "type": "string"
}

but we never need to do a full text search, so exact values with filter search would be perfectly fine. The only problem is that the type of those values cannot be change for the moment, because of a lack of time.

So my question is this : what will happen if I do a filter based search on full-text values ? Will the search criteria be analyzed like it would be if using a match search , or will the filter ignore the full-text type of this value and process it as an exact value, saving a lot of search time since filters are really fast ?

I looked into the documentation and around but could not get a clear answer.

Upvotes: 2

Views: 6977

Answers (1)

rchang
rchang

Reputation: 5236

You may have already empirically observed what happens when you try this, but in order for the term filter to behave as expected (exactly match the specified parameter in the specified field), the mapping for the index must define the field's index property as not_analyzed. The official documentation for the term filter is here, but the most immediately relevant portion may be this:

Filters documents that have fields that contain a term (not analyzed).

So, your index should have a mapping defined similarly to the following:

{"mappings" : {"the_document_type": {
  "countryId" : {"type" : "string", "index" : "not_analyzed"},
  ...
  ... Mappings for other fields in your document
  ...
}}}

Given the snippet above, a query containing a term filter requiring documents to exactly match some specified parameter for countryId should be successful. Something like the following:

{"query" : {"filtered" :
  "query" : {"match_all" : {}},
  "filter" : {"term" : {"countryId" : "Kingdom of Anvilania"}}
}}

There's further documentation of the string type (and all other core types) in Elasticsearch here, but the specific portion about the index attribute is this:

Set to analyzed for the field to be indexed and searchable after being broken down into token using an analyzer. not_analyzed means that its still searchable, but does not go through any analysis process or broken down into tokens. no means that it won’t be searchable at all (as an individual field; it may still be included in _all). Setting to no disables include_in_all. Defaults to analyzed.

Upvotes: 3

Related Questions