Kiryl A.
Kiryl A.

Reputation: 164

ElasticSearch doesn't recognise numbers

I use this configuration for search and mapping:

PUT :9200/subscribers

{
  "settings": {
    "analysis": {
      "analyzer": {
        "autocomplete": {
          "tokenizer": "autocomplete",
          "filter": [
            "lowercase"
          ]
        },
        "autocomplete_search": {
          "tokenizer": "lowercase"
        }
      },
      "tokenizer": {
        "autocomplete": {
          "type": "edge_ngram",
          "min_gram": 2,
          "max_gram": 10,
          "token_chars": [
            "letter",
            "digit"
          ]
        }
      }
    }
  },
  "mappings": {
    "doc": {
      "properties": {
         "id": {
          "type": "text",
          "analyzer": "autocomplete",
          "search_analyzer": "autocomplete_search"
        },
        "name": {
          "type": "text",
          "analyzer": "autocomplete",
          "search_analyzer": "autocomplete_search"
        },
         "contact_number": {
          "type": "text",
          "analyzer": "autocomplete",
          "search_analyzer": "autocomplete_search"
        }
      }
    }
  }
}

But when I add the new object:

POST :9200/subscribers/doc/?pretty

{
  "id": "1421997",
  "name": "John 333 Martin",
  "contact_number":"+43fdsds*543254365"
}

And if I search by multiple fields like that

POST :9200/subscribers/doc/_search

{
    "query": {
        "multi_match": {
            "query": "Joh",
            "fields": [
                "name",
                "id",
                "contact_number"
            ],
            "type": "best_fields"
        }
    }
}

It succesfully returns "John 333 Martin". But when I do: "query": "333" or "query": "+43fds" or "query": "14219", it returns nothing. That's strange cause I configured filters for digits too:

 "token_chars": [
            "letter",
            "digit"
          ]

What should I do in order to search by all fields and see results with numbers too?


UPDATE:

Even the GET :9200/subscribers/_analyze with

{
  "analyzer": "autocomplete",
  "text": "+43fdsds*543254365"
}

shows absolutely correct combinations like "43", "43f", "43fd", "43fds". But search doen't. May be my search query is incorrect?

Upvotes: 0

Views: 527

Answers (1)

ben5556
ben5556

Reputation: 3018

Your search is using a different analyzer than what is being used to create tokens in your inverted index. Because you are using lowercase tokenizer as search_analyzer, numbers are stripped. See below

POST _analyze
{
  "tokenizer": "lowercase",
  "text":     "+43fdsds*543254365"
}

produces

{
  "tokens" : [
    {
      "token" : "fdsds",
      "start_offset" : 3,
      "end_offset" : 8,
      "type" : "word",
      "position" : 0
    }
  ]
}

Instead use standard analyzer as your search_analyzer i.e. modify your mapping as shown below and it will work as expected

"mappings": {
    "doc": {
      "properties": {
         "id": {
          "type": "text",
          "analyzer": "autocomplete",
          "search_analyzer": "standard"
        },
        "name": {
          "type": "text",
          "analyzer": "autocomplete",
          "search_analyzer": "standard"
        },
         "contact_number": {
          "type": "text",
          "analyzer": "autocomplete",
          "search_analyzer": "standard"
        }
      }
    }
  }

Using standard analyzer

POST _analyze
{
  "analyzer": "standard",
  "text":     "+43fdsds*543254365"
}

Produces

{
  "tokens" : [
    {
      "token" : "43fdsds",
      "start_offset" : 1,
      "end_offset" : 8,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "543254365",
      "start_offset" : 9,
      "end_offset" : 18,
      "type" : "<NUM>",
      "position" : 1
    }
  ]
}

Upvotes: 1

Related Questions