Reputation: 347
How to apply match query on the field which has multiple keywords matching like "los angeles" has two words in it. How to match it from the below data structure
"addresses" : [
{
"type" : "Home",
"address" : "Los Angeles,CA,US"
}
]
Below are my mappings and settings, created custom settings and filters
PUT /test
{
"settings": {
"analysis": {
"filter" : {
"my_word_delimiter" : {
"type" : "word_delimiter",
"type_table": [
"# => ALPHANUM",
"+ => ALPHANUM",
"@ => ALPHANUM",
"% => ALPHANUM",
"~ => ALPHANUM",
"^ => ALPHANUM",
"$ => ALPHANUM",
"& => ALPHANUM",
"' => ALPHANUM",
"\" => ALPHANUM",
"\/ => ALPHANUM",
", => ALPHANUM"
],
"preserve_original": "true",
"generate_word_parts":false,
"generate_number_parts":false,
"split_on_case_change":false,
"split_on_numerics":false,
"stem_english_possessive":false
}
},
"analyzer": {
"default": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"my_word_delimiter"
]
}
},
"normalizer": {
"keyword_lowercase": {
"type": "custom",
"filter": [
"lowercase"
]
}
}
}
},
"mappings": {
"dynamic": "strict",
"properties": {
"addresses": {
"type": "nested",
"properties": {
"address": {
"type": "text"
},
"type": {
"type": "keyword"
}
}
}
}
}
}
tried with the below query but not getting the results
{
"from": "0",
"size": "30",
"query": {
"bool": {
"must": [
{
"bool": {
"should": [
{
"nested": {
"path": "addresses",
"query": {
"match": {
"addresses.address": {
"query": "Los Angeles",
"operator": "and"
}
}
}
}
}
]
}
}
]
}
},
"sort": [
{
"_score": {
"order": "desc"
}
}
]
}
Is there any problem with the settings created
Upvotes: 0
Views: 241
Reputation: 16192
You are not getting results in case the address has value like "Los Angeles,CA,US"
, because you are using whitespace
tokenizer.
The whitespace tokenizer breaks text into terms whenever it encounters a whitespace character.
Since you are using and
operator with match
query, so the query should retrieve that data which have both Los
and Angeles
, but due to whitespace tokenizer, no token for Angeles
is generated, therefore no results are returned.
POST/_analyze
{
"tokenizer": "whitespace",
"text": "Los Angeles,CA,US"
}
The tokens are:
{
"tokens": [
{
"token": "Los",
"start_offset": 0,
"end_offset": 3,
"type": "word",
"position": 0
},
{
"token": "Angeles,CA,US",
"start_offset": 4,
"end_offset": 17,
"type": "word",
"position": 1
}
]
}
But in the case of "Los Angeles ,CA,US"
, since there is a whitespace after Angeles
, so the tokens generated are: Los
, Angeles
, ,CA,US
Adding a working example with index data, mapping, and search result
Index Mapping:
Keep the mapping same, apart from changing from whitespace
to"tokenizer":"standard"
Analyze API
The standard tokenizer provides grammar-based tokenization
{
"tokenizer": "standard",
"text": "Los Angeles ,CA,US"
}
The tokens are:
{
"tokens": [
{
"token": "Los",
"start_offset": 0,
"end_offset": 3,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "Angeles",
"start_offset": 4,
"end_offset": 11,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "CA",
"start_offset": 13,
"end_offset": 15,
"type": "<ALPHANUM>",
"position": 2
},
{
"token": "US",
"start_offset": 16,
"end_offset": 18,
"type": "<ALPHANUM>",
"position": 3
}
]
}
Index Data:
{
"addresses": [
{
"type": "Home",
"address": "Los Angeles,CA,US"
}
]
}
Using the same search query as given in the result
Search Result:
"hits": [
{
"_index": "64624353",
"_type": "_doc",
"_id": "1",
"_score": 0.26706278,
"_source": {
"addresses": [
{
"type": "Home",
"address": "Los Angeles,CA,US"
}
]
}
}
]
NOTE: If you want to use whitespace
tokenizer, then remove "operator": "and"
from the search query, that you will get the required result
Update 1:
Try using this updated mapping:
{
"settings": {
"analysis": {
"filter": {
"my_word_delimiter": {
"type": "word_delimiter",
"type_table": [
"# => ALPHANUM",
"+ => ALPHANUM",
"@ => ALPHANUM",
"% => ALPHANUM",
"~ => ALPHANUM",
"^ => ALPHANUM",
"$ => ALPHANUM",
"& => ALPHANUM",
"' => ALPHANUM",
"\" => ALPHANUM",
"\/ => ALPHANUM"
],
"preserve_original": "true",
"generate_word_parts": true,
"generate_number_parts": false,
"split_on_case_change": false,
"split_on_numerics": false,
"stem_english_possessive": false
}
},
"analyzer": {
"default": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"my_word_delimiter"
]
}
},
"normalizer": {
"keyword_lowercase": {
"type": "custom",
"filter": [
"lowercase"
]
}
}
}
},
"mappings": {
"dynamic": "strict",
"properties": {
"addresses": {
"type": "nested",
"properties": {
"address": {
"type": "text"
},
"type": {
"type": "keyword"
}
}
}
}
}
}
true
so that the filter includes tokens consisting of alphabetical characters in the output.", => ALPHANUM"
, from type_table
Upvotes: 1