secmask
secmask

Reputation: 8107

Getting elasticsearch synonym work?

I'm trying a simple test on elasticsearch synonym without success, this is what I am so far

POST /mysearch
{
    "settings" : {
        "number_of_shards" :   5,
        "number_of_replicas" : 0,
        "analysis": {
            "filter" : {
                "my_ascii_folding" : {
                    "type" : "asciifolding",
                    "preserve_original" : true
                },
                "my_stopwords": {
                    "type":       "stop",
                    "stopwords": [ ]
                },
                "mysynonym" : {
                    "type" : "synonym",
                    "synonyms" : [
                        "foo => bar"
                    ]
                }
            },
            "char_filter": {
                "my_htmlstrip": {
                    "type": "html_strip"
                }
            }, 
            "analyzer": {
                "index_text_analyzer":{
                    "type": "custom",
                    "tokenizer":    "standard",
                    "filter":       [ "lowercase", "my_stopwords", "my_ascii_folding" ]
                },
                "index_html_analyzer":{
                    "type": "custom",
                    "tokenizer":    "standard",
                    "char_filter": "my_htmlstrip",
                    "filter":       [ "lowercase", "my_stopwords", "my_ascii_folding" ]
                },
                "search_text_analyzer":{
                    "type": "custom",
                    "tokenizer":    "standard",
                    "filter":       [ "mysynonym", "lowercase", "my_stopwords" ]
                }
            }
        }
    },
    "mappings" : {
        "news" : {
            "_source" : { "enabled" : true },
            "_all" : {"enabled" : false},
            "properties" : {
                "name" : { "type" : "string", "index" : "analyzed", "store": "yes" , "analyzer": "index_text_analyzer" , "search_analyzer": "search_text_analyzer" }
            }
        }
    }
}

Add some documnents

POST /mysearch/news
{
    "name":"foo kar"
}
POST /mysearch/news
{
    "name":"bar kar"
}

Do a search

POST /mysearch/_search?q=name:foo
{

}

Give me result that match foo , not bar , so why?

Upvotes: 1

Views: 1584

Answers (1)

Andrei Stefan
Andrei Stefan

Reputation: 52368

I think you are doing it wrong, for the following reasons:

  1. why do you use foo => bar? This means that you replace foo with bar, whereas if they are synonyms, they should be both indexed. So, I would use foo,bar instead.
  2. why, at indexing time, you are using a different analyzer than at search time? At indexing time you will want your text to be indexed using its synonyms.

Let me give you an example: assuming you index foo kar. Since bar is a synonym of foo you'd want to index its synonym, as well, so that the index will contain foo, bar, kar. In this way, if you search for foo or bar that document WILL be found in the index even if the original text didn't contain bar.

These being said, I would suggest the following:

POST /mysearch
{
  "settings": {
    "number_of_shards": 5,
    "number_of_replicas": 0,
    "analysis": {
      "filter": {
        "my_ascii_folding": {
          "type": "asciifolding",
          "preserve_original": true
        },
        "my_stopwords": {
          "type": "stop",
          "stopwords": []
        },
        "mysynonym": {
          "type": "synonym",
          "synonyms": [
            "foo,bar"
          ]
        }
      },
      "char_filter": {
        "my_htmlstrip": {
          "type": "html_strip"
        }
      },
      "analyzer": {
        "index_text_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "my_stopwords",
            "my_ascii_folding"
          ]
        },
        "index_html_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "char_filter": "my_htmlstrip",
          "filter": [
            "lowercase",
            "my_stopwords",
            "my_ascii_folding"
          ]
        },
        "search_text_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "mysynonym",
            "lowercase",
            "my_stopwords"
          ]
        }
      }
    }
  },
  "mappings": {
    "news": {
      "_source": {
        "enabled": true
      },
      "_all": {
        "enabled": false
      },
      "properties": {
        "name": {
          "type": "string",
          "index": "analyzed",
          "store": "yes",
          "analyzer": "search_text_analyzer"
        }
      }
    }
  }
}

Or, if you don't want to index the synonyms, just indexing the original text and then, only at search time, search for the synonyms, as well, do the following changes:

  • "synonyms": ["foo,bar"] because, as I mentioned above, you will replace foo with bar otherwise
  • explicitly specify the two analyzers:
"index_analyzer": "index_text_analyzer",
"search_analyzer": "search_text_analyzer"

The two changes above will result in your text being indexed as is (with no synonyms), but at search time, when you want to search for foo, Elasticsearch will instead search for its synonym, as well: foo or bar.

Upvotes: 3

Related Questions