Reputation: 33
I'm working with elasticsearch and coming up with such a problem. I defined an analyzer with type of shingle and create a mapping.
Here's the code:
{
"settings": {
"analysis": {
"char_filter": {
"icons": {
"type": "mapping",
"mappings_path": "analysis/char_filter.txt"
}
},
"filter": {
"synonym_filter": {
"type": "synonym",
"synonyms_path": "analysis/synonym_filter.txt"
},
"shingle_filter":{
"type":"shingle",
"max_shingle_size": 2,
"min_shingle_size": 2,
"output_unigrams": true,
"token_separator": ""
}
},
"analyzer": {
"my_analyzer": {
"filter": [
"lowercase",
"synonym_filter",
"shingle_filter"
],
"char_filter": [
"icons"
],
"tokenizer": "standard"
}
}
}
},
"mappings": {
"type-0": {
"properties": {
"text": {
"type": "text",
"analyzer": "my_analyzer"
}
}
}
}
}
And then, I put a document in the index.
{
"text":"hello"
}
After this I start to search like this:
{
"query":{
"match":{
"text":{
"query":"hell world",
"fuzziness":1
}
}
}
}
but it matches nothing. then I change my query to:
{
"query":{
"match":{
"text":{
"query":"world hell",
"fuzziness":1
}
}
}
}
this request get the document.
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.21576157,
"hits": [
{
"_index": "index-001",
"_type": "product",
"_id": "1",
"_score": 0.21576157,
"_source": {
"text": "hello"
}
}
]
}
}
My elasticsearch version is 6.2.4
Anyone can tell me the reason?
Upvotes: 3
Views: 1206
Reputation: 32386
fuzziness
with a combination of shingle_filter
is causing the issue. If you read the note from fuzziness in match query
Fuzzy matching is not applied to terms with synonyms or in cases where the analysis process produces multiple tokens at the same position. Under the hood these terms are expanded to a special synonym query that blends term frequencies, which does not support fuzzy expansion.
Pay attention to the bold part, fuzziness is not applied to token at the same position,
now let's inspect the token generated for your search term hell world
.
{
"tokens": [
{
"token": "hell",
"start_offset": 0,
"end_offset": 4,
"type": "<ALPHANUM>",
"position": 0 // position 0 for hell
},
{
"token": "hellworld",
"start_offset": 0,
"end_offset": 10,
"type": "shingle",
"position": 0, // again position 0 for
"positionLength": 2
},
{
"token": "world",
"start_offset": 5,
"end_offset": 10,
"type": "<ALPHANUM>",
"position": 1 //position 1
}
]
}
So for position 0 tokens hell
and hellworld
fuzziness will not be applied hence it doesn't match the index token hello
and doesn't return any result.
Now inspect the tokens of world hell
{
"tokens": [
{
"token": "world",
"start_offset": 0,
"end_offset": 5,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "worldhell",
"start_offset": 0,
"end_offset": 10,
"type": "shingle",
"position": 0,
"positionLength": 2
},
{
"token": "hell",
"start_offset": 6,
"end_offset": 10,
"type": "<ALPHANUM>",
"position": 1 // this hell position is unique as 1 so it fuzziness will be applied.
}
]
}
Now when you query with world hell
, on hell
token fuzziness
will be applied and it would match the hello
indexed tokens and returns the search result.
You can again change the search term to world hell elastic
so now hell
will not have a unique position, so it won't bring search results again. Hope this will clear your concepts.
Upvotes: 4