simbolo
simbolo

Reputation: 7504

Elastic Search: Ability to search 'n1' and match 'N°1'

I've tried matching with synonyms of various combinations, in an effort to be able to query simply for n1 and find items containing N°1 (that's the degree symbol).

If I search for N°1 I can find the desired rows without problem. The synonym does work as if I search 'test' it will match 'testword'. I wonder if the asciifolder or lowercase filters could be interfering with the degree symbol, or something in the standard filters (as even removing these filters doesn't make a difference).

This is from the indexes settings.

filter: {
    exampleSynonyms: {
        type: 'synonym',
        synonyms: [
            'n1, no1, number1, no 1, n 1, number 1 => N°1',
            'test => testword'
        ]
    },
    exampleStops: {
        type: 'stop',
        stopwords: ['N°1', 'n°1']
    },
    exampleAscii: {
        type: 'asciifolding',
        preserve_original: true
    }
},
analyzer: {
    default_search: {
        tokenizer: 'standard',
        filter: ['exampleStops', 'exampleSynonyms', 'lowercase', 'exampleAscii' ]
    }

}

What could prevent the ° from being used in a synonym?

PS. The degree character is within the ASCII set.

Upvotes: 0

Views: 66

Answers (1)

ChintanShah25
ChintanShah25

Reputation: 12672

The problem here is that standard tokenizer removes ° even before it reaches synonym filter. You can verify this with analyze api.

curl -XGET 'localhost:9200/_analyze' -d '
{
  "tokenizer" : "standard",
  "text" : "N°1"
}'

You will see two tokens N and 1 . Filters are applied after tokenization, so rather than synonyms you could use pattern replace char filter and replace degree symbol with empty string. This is a minimal setup

PUT degree
{
  "settings": {
    "analysis": {
      "analyzer": {
        "degree_analyzer": {
          "char_filter": [
            "degree_mapping"
          ],
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "asciifolding"
          ]
        }
      },
      "char_filter": {
        "degree_mapping": {
          "type": "pattern_replace",
          "pattern": "°",
          "replacement": ""
        }
      }
    }
  },
  "mappings": {
    "mydoctype":{
      "properties": {
        "title" : {
          "type": "string",
          "analyzer": "degree_analyzer"
        }
      }
    }
  }
}

with this N°1 will be indexed as n1 and simple match query will give you the desired results

{
  "query": {
    "match": {
      "title": "n1"
    }
  }
}

Hope this helps.

Upvotes: 1

Related Questions