Reputation: 8107
I'm trying a simple test on elasticsearch synonym without success, this is what I am so far
POST /mysearch
{
"settings" : {
"number_of_shards" : 5,
"number_of_replicas" : 0,
"analysis": {
"filter" : {
"my_ascii_folding" : {
"type" : "asciifolding",
"preserve_original" : true
},
"my_stopwords": {
"type": "stop",
"stopwords": [ ]
},
"mysynonym" : {
"type" : "synonym",
"synonyms" : [
"foo => bar"
]
}
},
"char_filter": {
"my_htmlstrip": {
"type": "html_strip"
}
},
"analyzer": {
"index_text_analyzer":{
"type": "custom",
"tokenizer": "standard",
"filter": [ "lowercase", "my_stopwords", "my_ascii_folding" ]
},
"index_html_analyzer":{
"type": "custom",
"tokenizer": "standard",
"char_filter": "my_htmlstrip",
"filter": [ "lowercase", "my_stopwords", "my_ascii_folding" ]
},
"search_text_analyzer":{
"type": "custom",
"tokenizer": "standard",
"filter": [ "mysynonym", "lowercase", "my_stopwords" ]
}
}
}
},
"mappings" : {
"news" : {
"_source" : { "enabled" : true },
"_all" : {"enabled" : false},
"properties" : {
"name" : { "type" : "string", "index" : "analyzed", "store": "yes" , "analyzer": "index_text_analyzer" , "search_analyzer": "search_text_analyzer" }
}
}
}
}
Add some documnents
POST /mysearch/news
{
"name":"foo kar"
}
POST /mysearch/news
{
"name":"bar kar"
}
Do a search
POST /mysearch/_search?q=name:foo
{
}
Give me result that match foo
, not bar
, so why?
Upvotes: 1
Views: 1584
Reputation: 52368
I think you are doing it wrong, for the following reasons:
foo => bar
? This means that you replace foo
with bar
, whereas if they are synonyms, they should be both indexed. So, I would use foo,bar
instead.Let me give you an example: assuming you index foo kar
. Since bar
is a synonym of foo
you'd want to index its synonym, as well, so that the index will contain foo
, bar
, kar
. In this way, if you search for foo
or bar
that document WILL be found in the index even if the original text didn't contain bar
.
These being said, I would suggest the following:
POST /mysearch
{
"settings": {
"number_of_shards": 5,
"number_of_replicas": 0,
"analysis": {
"filter": {
"my_ascii_folding": {
"type": "asciifolding",
"preserve_original": true
},
"my_stopwords": {
"type": "stop",
"stopwords": []
},
"mysynonym": {
"type": "synonym",
"synonyms": [
"foo,bar"
]
}
},
"char_filter": {
"my_htmlstrip": {
"type": "html_strip"
}
},
"analyzer": {
"index_text_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"my_stopwords",
"my_ascii_folding"
]
},
"index_html_analyzer": {
"type": "custom",
"tokenizer": "standard",
"char_filter": "my_htmlstrip",
"filter": [
"lowercase",
"my_stopwords",
"my_ascii_folding"
]
},
"search_text_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"mysynonym",
"lowercase",
"my_stopwords"
]
}
}
}
},
"mappings": {
"news": {
"_source": {
"enabled": true
},
"_all": {
"enabled": false
},
"properties": {
"name": {
"type": "string",
"index": "analyzed",
"store": "yes",
"analyzer": "search_text_analyzer"
}
}
}
}
}
Or, if you don't want to index the synonyms, just indexing the original text and then, only at search time, search for the synonyms, as well, do the following changes:
"synonyms": ["foo,bar"]
because, as I mentioned above, you will replace foo
with bar
otherwise"index_analyzer": "index_text_analyzer",
"search_analyzer": "search_text_analyzer"
The two changes above will result in your text being indexed as is (with no synonyms), but at search time, when you want to search for foo
, Elasticsearch will instead search for its synonym, as well: foo
or bar
.
Upvotes: 3