Reputation: 23
I have a field with phone numbers with this format - XXX-XXX-XXXX or XXXXXXXXXX (its a merged table).
I want to be able to search XXXXXXXXXX and get results from both formats.
I tried using the decimal digit filter but it didn't work. Here are the settings that i have tried which are as follow:
mapping = {
'mappings': {
DOC_TYPE: {
'properties': {
'first_name': {
'type': 'text',
'analyzer': 'word_splitter'
},
'last_name': {
'type': 'text',
'analyzer': 'word_splitter'
},
'email': {
'type': 'text',
'analyzer': 'email'
},
'gender': {
'type': 'text'
},
'ip_address': {
'type': 'text'
},
'language': {
'type': 'text'
},
'phone': {
'type': 'text',
'analyzer': 'digits'
},
'id': {
'type': 'long'
}
}
}
},
'settings': {
'analysis': {
'analyzer': {
'my_analyzer': {
'type': 'whitespace'
},
'better': {
'type': 'standard'
},
'word_splitter': {
'type': 'custom',
'tokenizer': 'nGram',
'min_gram': 5,
'max_gram': 5,
'filter': [
'lowercase'
]
},
'email': {
'type': 'custom',
'tokenizer': 'uax_url_email'
},
'digits': {
'type': 'custom',
'tokenizer': 'whitespace',
'filter': [
'decimal_digit'
]
}
}
}
}
}
Any ideas ?
Upvotes: 0
Views: 620
Reputation: 1691
Use a char_filter to remove the hyphens before indexing. As a simple example:
Set up the custom analyzer and apply it to the phone field.
PUT my_index
{
"settings": {
"analysis": {
"analyzer": {
"phone_analyzer": {
"tokenizer": "standard",
"char_filter": [
"phone_char_filter"
]
}
},
"char_filter": {
"phone_char_filter": {
"type": "mapping",
"mappings": [
"- => "
]
}
}
}
},
"mappings": {
"_doc": {
"properties": {
"phone": {
"type": "text",
"analyzer": "phone_analyzer"
}
}
}
}
}
Add some docs
POST my_index/_doc
{"phone": "123-456-7890"}
POST my_index/_doc
{"phone": "2345678901"}
Search in xxx-xxx-xxxx format
GET my_index/_search
{
"query": {
"match": {
"phone": "123-456-7890"
}
}
}
Search in xxxxxxxxxx format
GET my_index/_search
{
"query": {
"match": {
"phone": "1234567890"
}
}
}
Upvotes: 1