satnam
satnam

Reputation: 1425

failed to find global token filter under [synonym]

So from the documentation on this page, seems like I can build a custom transient analyzer out of tokenizers, token filters and char filters and test it against my sample text using the Analyze API.

The goal is that I want to see if the synonym token filter satisfies my needs in terms of which terms get labelled as synonyms and which do not.

But when I do

curl -XGET 'localhost:9200/_analyze?char_filters=html_strip&tokenizer=whitespace&token_filters=synonym' -d 'man and male are same'

Rather than getting a result, I get

{
  "error": "ElasticsearchIllegalArgumentException[failed to find global token filter under [synonym]]",
  "status": 400
}

Any ideas what I'm doing wrong here?

Upvotes: 2

Views: 1424

Answers (1)

Carl G
Carl G

Reputation: 18250

It is currently not possible to use an ad-hoc synonym token filter due to the implementation requiring access "to the tokenizer factories for the index." (See elasticsearch Github issue.) Unfortunately this limitation is currently undocumented for the docs on using custom token filters on the _analyze endpoint

Here are some example commands to create and update a synonym token filter using the method of re-opening the index:

# create index with filter
curl -v -X PUT -s -H 'Content-Type: application/json' 'localhost:9200/syn_test_idx' -d '
{
  "settings" : {
    "analysis" : {
      "filter" : {
        "test_synonym_filter" : {
          "type" : "synonym",
          "synonyms" : [
            "i-pod, i pod => ipod",
            "universe, cosmos"
          ]
        }
      }
    }
  }
}

# test token filter
' | jq .
curl -X POST -s -H 'Content-Type: application/json' 'localhost:9200/syn_test_idx/_analyze' -d '{
  "tokenizer": "standard",
  "filter": ["global_synonym_filter"],
  "text": "cow i phone"
}' | jq .

("i phone" not caught by synonym list.)

# update index
curl -X POST -s 'localhost:9200/syn_test_idx/_close' | jq .
curl -X PUT -s -H 'Content-Type: application/json' 'localhost:9200/syn_test_idx/_settings' -d '{
  "analysis" : {
    "filter": {
      "test_synonym_filter":{
        "type":"synonym",
        "synonyms" : [
          "i-pod, i pod => ipod",
          "universe, cosmos",
          "i-phone, i phone => iphone"
        ]
      }
    }
  }
}' | jq .
curl -X POST -s 'localhost:9200/syn_test_idx/_open' | jq .

# test token filter
' | jq .
curl -X POST -s -H 'Content-Type: application/json' 'localhost:9200/syn_test_idx/_analyze' -d '{
  "tokenizer": "standard",
  "filter": ["global_synonym_filter"],
  "text": "cow i phone"
}' | jq .

("i phone" translated into "iphone" by synonym list.)

(On an unrelated note, my zsh/YADR setup for some reason doesn't show post response bodies, hence I am piping it through jq.)

Upvotes: 2

Related Questions