Imran Azad
Imran Azad

Reputation: 1058

Multiword Synonyms and Phrase Queries

Is there a mistake in the Elastic documentation?

Given the following index mapping:

PUT /my_index
{
  "settings": {
    "analysis": {
      "filter": {
        "my_synonym_filter": {
          "type": "synonym",
          "synonyms": [
            "usa,united states,u s a,united states of america"
          ]
        }
      },
      "analyzer": {
        "my_synonyms": {
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "my_synonym_filter"
          ]
        }
      }
    }
  }
}

Given this document:

put /my_index/country/1
{
  "title" : "The United States is wealthy"
}

In the documentation it states:

These phrases would not match:

The usa is wealthy

The united states of america is wealthy

The U.S.A. is wealthy

However, these phrases would:

United states is wealthy

Usa states of wealthy

The U.S. of wealthy

U.S. is america

However this does not seem to be the case - the phrases that should match aren't matching at all! Here is the query I am running (without synonym expansion at query time as per the documentation):

GET /my_index/country/_search
{

    "query" : {
        "match_phrase" : {
            "title" : {
               "query" : "United States is wealthy",
               "analyzer": "standard"
            }

        }
    }
}

What am I missing here?

Upvotes: 1

Views: 1290

Answers (2)

keety
keety

Reputation: 17461

The example in documentation works for me.

Probably you forgot to set the analyzer for title field in the mapping.

Example:

1) Create Index

PUT /my_index
{
  "settings": {
    "analysis": {
      "filter": {
        "my_synonym_filter": {
          "type": "synonym",
          "synonyms": [
            "usa,united states,u s a,united states of america"
          ]
        }
      },
      "analyzer": {
        "my_synonyms": {
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "my_synonym_filter"
          ]
        }
      }
    }
  }
}

2) Add Mapping

PUT my_index/country/_mapping
{
    "properties" : {
        "title" : {"type" : "string","analyzer" : "my_synonyms"}
    }
}

3) Index Document

PUT /my_index/country/1
{
  "title" : "The United States is wealthy"
}

4) Query

GET /my_index/country/_search
{

    "query" : {
        "match_phrase" : {
            "title" : {
               "query" : "United States is wealthy",
               "analyzer": "standard"
            }

        }
    }
}

5) Response :

{
   "took": 8,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 0.75942194,
      "hits": [
         {
            "_index": "my_index",
            "_type": "country",
            "_id": "1",
            "_score": 0.75942194,
            "_source": {
               "title": "The United States is wealthy"
            }
         }
      ]
   }
}

Upvotes: 1

IanGabes
IanGabes

Reputation: 2797

So close, you missed one thing!

In your query, you should change the analyzer! You have to run your query text against the my_synonym analyzer to be able to match the synonyms. Currently, you have the query using the standard analyzer, which simply tokenizes your text as united, states,is,wealthy, instead of also using all of your synonyms.

Change this:

GET /my_index/country/_search
{

    "query" : {
        "match_phrase" : {
            "title" : {
               "query" : "United States is wealthy",
               "analyzer": "standard"
            }

        }
    }
}

To this:

GET /my_index/country/_search
{

    "query" : {
        "match_phrase" : {
            "title" : {
               "query" : "United States is wealthy",
               "analyzer": "my_synonyms"
            }

        }
    }
}

Now, when you query, the text United States will properly get tokenized to usa

Upvotes: 1

Related Questions