Reputation: 1838
I'm working on a query based on name fields on Elasticsearch 2.4. The fields I'm interested in are:
If I send this query:
{"query":
{"bool" :
{"must" : [
{"match" : {"state" : {"query" : "michoacán de ocampo", "type" : "boolean"} } },
{"match" : {"colony" : {"query" : "zamora", "type" : "boolean"} } },
{"match" : {"city" : {"query" : "zamora", "type" : "boolean"} } }
],
"filter" : {"term" : {"state" : "michoacán"} }
}
} }
Results
{
"_shards": {
"failed": 0,
"successful": 5,
"total": 5
},
"hits": {
"hits": [
{
"_id": "71807",
"_index": "my_place",
"_score": 8.708784,
"_source": {
"@timestamp": "2019-11-13T15:34:33.373Z",
"@version": "1",
"city": "Zamora",
"city_id": 828,
"colony": "Balcones de Zamora",
"id": 71807,
"state": "Michoacán de Ocampo",
"state_id": 16,
"type": "place",
"zipcode": "59624",
"zone_id": null
},
"_type": "place"
},
{
"_id": "71762",
"_index": "my_place",
"_score": 8.634264,
"_source": {
"@timestamp": "2019-11-13T15:34:33.112Z",
"@version": "1",
"city": "Zamora",
"city_id": 828,
"colony": "Zamora de Hidalgo Centro",
"id": 71762,
"state": "Michoacán de Ocampo",
"state_id": 16,
"type": "place",
"zipcode": "59600",
"zone_id": null
},
"_type": "place"
}
],
"max_score": 8.708784,
"total": 2
},
"timed_out": false,
"took": 5
}
Which are OK
But if I sent the full name of the state in the filter, like this (note the full name "Michoacán de ocampo" in the filter)
{"query":
{"bool" :
{"must" : [
{"match" : {"state" : {"query" : "michoacán de ocampo", "type" : "boolean"} } },
{"match" : {"colony" : {"query" : "zamora", "type" : "boolean"} } },
{"match" : {"city" : {"query" : "zamora", "type" : "boolean"} } }
],
"filter" : {"term" : {"state" : "Michoacán de Ocampo"} }
}
} }
I got these results:
{
"_shards": {
"failed": 0,
"successful": 5,
"total": 5
},
"hits": {
"hits": [],
"max_score": null,
"total": 0
},
"timed_out": false,
"took": 6
}
I need to send the full name in the filter, how can I achieve this or reconfigure my index in order to have the same results?
Upvotes: 2
Views: 5300
Reputation: 32386
Update : As OP mentioned in the comment that he is using 2.4, I am updating my solution to include the solution which works for it.
{
"settings": {
"analysis": {
"analyzer": {
"lckeyword": {
"filter": [
"lowercase"
],
"tokenizer": "keyword"
}
}
}
},
"mappings": {
"so": {
"properties": {
"state": {
"type": "string"
},
"city": {
"type": "string"
},
"colony": {
"type": "string"
},
"state_raw": {
"type": "string",
"analyzer": "lckeyword"
}
}
}
}
}
{
"query": {
"filtered": {
"query": {
"bool": {
"should": [
{
"match": {
"state": {
"query": "michoacán de ocampo"
}
}
},
{
"match": {
"colony": {
"query": "zamora"
}
}
},
{
"match": {
"city": {
"query": "zamora"
}
}
}
]
}
},
"filter": {
"term": {
"state_raw": "michoacán de ocampo"
}
}
}
}
}
An important thing to note here is creating a custom analyzer(keyword with lowercase filter), so that field on which we are creating filter stored as it is but with small letter, as that is what you are passing in your query. Now above query returns you both your document, this is the postman collection that has index creation, sample docs creation and query which return both docs returned.
The issue is that you are defining your state
field as text
field and then in your filter, you are using [term][1]
query which is not analyzed as explained in official ES doc.
Returns documents that contain an exact term in a provided field.
Hence it would try to find token `Michoacán de Ocampo` in inverted index which isn't present as state field is defined as text and generates 3 tokens `michoacán`, `de` and `ocampo` and ES works on token(search term) to token(inverted index) match. You can check these tokens with [analyze API][2] and can use [explain API][3] to see the tokens generated by ES when the query has results
Fix
---
Define `state` field as a [multi-field][4] and store it as it is(kwyword form) so that you can filter on it.
{
"mappings": {
"properties": {
"state": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
},
"city": {
"type": "text"
},
"colony": {
"type": "text"
}
}
}
}
Now below query would give you both results.
{
"query": {
"bool": {
"must": [
{
"match": {
"state": {
"query": "michoacán de ocampo"
}
}
},
{
"match": {
"colony": {
"query": "zamora"
}
}
},
{
"match": {
"city": {
"query": "zamora"
}
}
}
],
"filter": {
"term": {
"state.raw": "Michoacán de Ocampo" -->notice .raw to search on keyword field.
}
}
}
}
}
EDIT: - https://www.getpostman.com/collections/f4b9ed00d50e2f4bc7f4 is the postman collection link if you want to quickly test it.
Upvotes: 2
Reputation: 2547
my guess is that the mapping of your state
field is the default one, i.e., state
is a text field, with a keyword sub-field (see dynamic field mapping).
If this is the case, then the filter of your first query "works" because it matches one of the tokens created by the default text analyzers. In fact, "Michoacán de Ocampo" is processed into these three lowercase tokens: ["michoacán", "de", "ocampo" ].
For the same reason, the second filter cannot match, because you are keeping the phrase "Michoacán de Ocampo" with the case. What should work is the following query:
{
"query": {
"bool": {
"must": [
{
"match": {
"state": {
"query": "michoacán de ocampo"
}
}
},
{
"match": {
"colony": {
"query": "zamora"
}
}
},
{
"match": {
"city": {
"query": "zamora"
}
}
}
],
"filter": {
"term": {
"state.keyword": "Michoacán de Ocampo"
}
}
}
}
}
Upvotes: 1