Reputation: 7504
I've tried matching with synonyms of various combinations, in an effort to be able to query simply for n1
and find items containing N°1
(that's the degree symbol).
If I search for N°1
I can find the desired rows without problem. The synonym does work as if I search 'test' it will match 'testword'. I wonder if the asciifolder
or lowercase
filters could be interfering with the degree symbol, or something in the standard
filters (as even removing these filters doesn't make a difference).
This is from the indexes settings.
filter: {
exampleSynonyms: {
type: 'synonym',
synonyms: [
'n1, no1, number1, no 1, n 1, number 1 => N°1',
'test => testword'
]
},
exampleStops: {
type: 'stop',
stopwords: ['N°1', 'n°1']
},
exampleAscii: {
type: 'asciifolding',
preserve_original: true
}
},
analyzer: {
default_search: {
tokenizer: 'standard',
filter: ['exampleStops', 'exampleSynonyms', 'lowercase', 'exampleAscii' ]
}
}
What could prevent the °
from being used in a synonym?
PS. The degree character is within the ASCII set.
Upvotes: 0
Views: 66
Reputation: 12672
The problem here is that standard tokenizer
removes °
even before it reaches synonym filter
. You can verify this with analyze api.
curl -XGET 'localhost:9200/_analyze' -d '
{
"tokenizer" : "standard",
"text" : "N°1"
}'
You will see two tokens N and 1 . Filters
are applied after tokenization
, so rather than synonyms you could use pattern replace char filter and replace degree symbol with empty string. This is a minimal setup
PUT degree
{
"settings": {
"analysis": {
"analyzer": {
"degree_analyzer": {
"char_filter": [
"degree_mapping"
],
"tokenizer": "standard",
"filter": [
"lowercase",
"asciifolding"
]
}
},
"char_filter": {
"degree_mapping": {
"type": "pattern_replace",
"pattern": "°",
"replacement": ""
}
}
}
},
"mappings": {
"mydoctype":{
"properties": {
"title" : {
"type": "string",
"analyzer": "degree_analyzer"
}
}
}
}
}
with this N°1
will be indexed as n1
and simple match query
will give you the desired results
{
"query": {
"match": {
"title": "n1"
}
}
}
Hope this helps.
Upvotes: 1