searching only digits in a mixed field (elasticsearch)

Question

I have a field with phone numbers with this format - XXX-XXX-XXXX or XXXXXXXXXX (its a merged table).

I want to be able to search XXXXXXXXXX and get results from both formats.

I tried using the decimal digit filter but it didn't work. Here are the settings that i have tried which are as follow:

mapping = {
    'mappings': {
        DOC_TYPE: {
            'properties': {
                'first_name': {
                    'type': 'text',
                    'analyzer': 'word_splitter'
                },
                'last_name': {
                    'type': 'text',
                    'analyzer': 'word_splitter'
                },
                'email': {
                    'type': 'text',
                    'analyzer': 'email'
                },
                'gender': {
                    'type': 'text'
                },
                'ip_address': {
                    'type': 'text'
                },
                'language': {
                    'type': 'text'
                },
                'phone': {
                    'type': 'text',
                    'analyzer': 'digits'
                },
                'id': {
                    'type': 'long'
                }

            }
        }
    },
    'settings': {
        'analysis': {
            'analyzer': {
                'my_analyzer': {
                    'type': 'whitespace'
                },
                'better': {
                    'type': 'standard'
                },
                'word_splitter': {
                    'type': 'custom',
                    'tokenizer': 'nGram',
                    'min_gram': 5,
                    'max_gram': 5,
                    'filter': [
                        'lowercase'
                    ]
                },
                'email': {
                    'type': 'custom',
                    'tokenizer': 'uax_url_email'
                },
                'digits': {
                    'type': 'custom',
                    'tokenizer': 'whitespace',
                    'filter': [
                        'decimal_digit'
                    ]
                }
            }
        }
    }
}

Any ideas ?

Adam T · Accepted Answer

Use a char_filter to remove the hyphens before indexing. As a simple example:

Set up the custom analyzer and apply it to the phone field.

PUT my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "phone_analyzer": {
          "tokenizer": "standard",
          "char_filter": [
            "phone_char_filter"
          ]
        }
      },
      "char_filter": {
        "phone_char_filter": {
          "type": "mapping",
          "mappings": [
            "- => "
          ]
        }
      }
    }
  },
  "mappings": {
    "_doc": {
      "properties": {
        "phone": { 
          "type": "text",
          "analyzer": "phone_analyzer"
        }
      }
    }
  }
}

Add some docs

POST my_index/_doc
{"phone": "123-456-7890"}

POST my_index/_doc
{"phone": "2345678901"}

Search in xxx-xxx-xxxx format

GET my_index/_search
{
  "query": {
    "match": {
      "phone": "123-456-7890"
    }
  }
}

Search in xxxxxxxxxx format

GET my_index/_search
{
  "query": {
    "match": {
      "phone": "1234567890"
    }
  }
}

searching only digits in a mixed field (elasticsearch)

Answers (1)

Related Questions