ulric260
ulric260

Reputation: 374

Elasticsearch using shingle filter with synonym

I have the following documents:

I want to retrieve my "south africa" document from:

I defined the followings filters and analyzers:

POST test_index
{
  "settings": {
   "analysis": {
      "filter": {
        "synonym_filter": {
          "type": "synonym",
          "synonyms": [
            "south,s",
            "north,n"
          ]
        },
        "shingle_filter": {
            "type": "shingle",
            "min_shingle_size": 2,
            "max_shingle_size": 3,
            "token_separator": ""
          }
      },
      "analyzer": {
        "my_shingle": {
          "type":      "custom",
          "tokenizer": "standard",
          "filter":    ["shingle_filter"]
        },
        "my_shingle_synonym": {
          "type":      "custom",
          "tokenizer": "standard",
          "filter":    ["shingle_filter", "synonym_filter"]
        },
        "my_synonym_shingle": {
          "type":      "custom",
          "tokenizer": "standard",
          "filter":    ["synonym_filter", "shingle_filter"]
        }
    }
  } 
  },
  "mappings": {}
}

1) With my_shingle south africa will be indexed as south, southafrica, africa

2) With my_shingle_synonym south africa will be indexed as south, s, southafrica, africa

3) With my_synonym_shingle south africa will be indexed as south, souths, southsafrica, s, safrica, africa

So with

I want south africa to be indexed as: south, s, southafrica, safrica, africa

Upvotes: 3

Views: 1468

Answers (1)

ChintanShah25
ChintanShah25

Reputation: 12672

You do not have to output all possible tokens as per your requirement. Your problem can be solved by using different analyzers on multi fields.

You would define mapping of your desired field like this.

"mappings": {
    "your_mapping": {
      "properties": {
        "name": {
          "type": "string",
          "analyzer": "my_shingle",
          "fields": {
            "synonym": {
              "type": "string",
              "analyzer": "my_synonym_shingle"
            }
          }
        }
      }
    }
  }

sample document to index

PUT test_index/your_mapping/1
{
  "name" : "south africa"
}

then you would query on all variants of name field with wildcard expression.

GET test_index/your_mapping/_search
{
  "query": {
    "query_string": {
      "fields": [
        "name*"
      ],
      "query": "safrica"
    }
  }
}

Upvotes: 1

Related Questions