Raghunandan J
Raghunandan J

Reputation: 604

why is shingle token filter with analyser isn't yielding expected results?

Hi here are my index details:

PUT shingle_test
{
  "settings": {
    "analysis": {
      "analyzer": {
        "evolutionAnalyzer": {
          "tokenizer": "standard",
          "filter": [
            "standard",
            "custom_shingle"
          ]
        }
      },
      "filter": {
        "custom_stop": {
            "type": "stop",
            "stopwords": "_english_"
        },
        "custom_shingle": {
            "type": "shingle",
            "min_shingle_size": "2",
            "max_shingle_size": "10",
            "output_unigrams": false
        }
      }
    }
  }, 
  "mappings": {
    "legacy" : {
      "properties": {
        "name": {
          "type": "text",
          "fields": {
            "shingles": {
              "type": "text",
              "analyzer": "standard",
              "search_analyzer": "evolutionAnalyzer"
            },
            "as_is": {
              "type": "keyword"
            }
          },
          "analyzer": "standard"
        }
      }
    }
  }
}

Added 2 docs

PUT shingle_test/legacy/1
{
  "name": "Chandni Chowk 2 Banglore"
}

PUT shingle_test/legacy/2
{
  "name": "Chandni Chowk"
}

Nothing is being returned if I do this,

GET shingle_test/_search
{
  "query": {
    "match": {
      "name": {
        "query": "Chandni Chowk",
        "analyzer": "evolutionAnalyzer"
      }
    }
  }
}

Looked at all possible solutions online, didn't get any.

Also, if I do "output_unigrams": true, then it just works like match query and gives results.

The thing I'm trying to achieve:

Having these documents:

  1. Chandni Chowk 2 Bangalore
  2. Chandni Chowk
  3. CCD Bangalore
  4. Istah shawarma and biryani
  5. Istah

So, searching for "Chandni Chowk 2 Bangalore" should return 1, 2

searching for "Chandni Chowk" should return 1, 2

searching for "Istah shawarma and biryani" should return 4, 5

searching for "Istah" should return 4, 5

searching for "CCD Bangalore" should return 3

note: search keyword will always be exactly equal to value of the name field in the document ex: In this particular index, we can query "Chandni Chowk 2 Bangalore", "Chandni Chowk", "CCD Bangalore", "Istah shawarma and biryani", "Istah". "CCD" won't be queried on this index.

Upvotes: 0

Views: 908

Answers (1)

Bhavya
Bhavya

Reputation: 16192

The analyzer parameter specifies the analyzer used for text analysis when indexing or searching a text field.

Modify your index mapping as

{
  "settings": {
    "analysis": {
      "analyzer": {
        "evolutionAnalyzer": {
          "tokenizer": "standard",
          "filter": [
            "standard",
            "custom_shingle"
          ]
        }
      },
      "filter": {
        "custom_stop": {
            "type": "stop",
            "stopwords": "_english_"
        },
        "custom_shingle": {
            "type": "shingle",
            "min_shingle_size": "2",
            "max_shingle_size": "10",
            "output_unigrams": true         // note this
        }
      }
    }
  }, 
  "mappings": {
    "legacy" : {
      "properties": {
        "name": {
          "type": "text",
          "fields": {
            "shingles": {
              "type": "text",
              "analyzer": "evolutionAnalyzer",         // note this
              "search_analyzer": "evolutionAnalyzer"
            },
            "as_is": {
              "type": "keyword"
            }
          },
          "analyzer": "standard"
        }
      }
    }
  }
}

And, the modified search query will be

{
  "query": {
    "match": {
      "name.shingles": {
        "query": "Chandni Chowk"
      }
    }
  }
}

Search Results:

"hits": [
      {
        "_index": "66127416",
        "_type": "_doc",
        "_id": "2",
        "_score": 0.25759193,
        "_source": {
          "name": "Chandni Chowk"
        }
      },
      {
        "_index": "66127416",
        "_type": "_doc",
        "_id": "1",
        "_score": 0.19363807,
        "_source": {
          "name": "Chandni Chowk 2 Banglore"
        }
      }
    ]

Upvotes: 2

Related Questions