How to properly handle multi words synonym expansion using elasticsearch?

Question

I have the following synonym expansion :

suco => suco, refresco, bebida de soja

What i want is to tokenize the search this way:

Search for "suco de laranja" would be tokenized to ["suco", "laranja", "refresco", "bebida de soja"].

But i'm getting it tokenized to ["suco", "laranja", "refresco", "bebida", "soja"].

Consider that the "de" word is a stop word. And i want it to be ignored on the query like "bebida de laranja" becomes ["bebida", "laranja"]. But i don't want it to be considered on the synonym tokenization so "bebida de soja" still stays as one token "bebida de soja".

my settings :

{
    "settings":{
        "analysis":{
            "filter":{
                "synonym_br":{
                    "type":"synonym",
                    "synonyms":[
                        "suco => suco, refresco, bebida de soja"
                    ]
                },
                "brazilian_stop":{
                    "type":"stop",
                    "stopwords":"_brazilian_"
                }
            },
            "analyzer":{
                "synonyms":{
                    "filter":[
                        "synonym_br",
                        "lowercase",
                        "brazilian_stop",
                        "asciifolding"
                    ],
                    "type":"custom",
                    "tokenizer":"standard"
                }
            }
        }
    }
}

How to properly handle multi words synonym expansion using elasticsearch?

Answers (1)

How does this work?

Related Questions