Searching a number in a string field with query_string on Elasticsearch

Question

Among other text fields, I've got this string field in my Elasticsearch index:

"user": { "type": "string", "analyzer": "simple", "norms": { "enabled": False } }

It gets filled with a typical username, e.g. "simon".

Using query_string I can limit my search results for "other search terms" to this particular user:

'query': { 'query_string': { 'query': 'user:simon other search terms' } }

Default operator is set to "AND". However, in case a username only consists of a number (saved and indexed as string), Elasticsearch appears to ignore the "user:..." statement. For example:

'query': { 'query_string': { 'query': 'user:111 other search terms' } }

yields the same results as

'query': { 'query_string': { 'query': 'other search terms' } }

Any idea what might be the cause or how to fix it?

Andrei Stefan · Accepted Answer

You are using the simple tokenizer. As the documentation says:

An analyzer of type simple that is built using a Lower Case Tokenizer.

And the lower case tokenizer uses the letter tokenizer and the lower case token filter. The problem with your specific test data is that the letter tokenizer divides the text at non-letters. And the digits are non-letters. This method from Java API defines what exactly is a letter. In contrast, this method from Java API defines what exactly is a digit.

You may want to look at the standard tokenizer instead.

Searching a number in a string field with query_string on Elasticsearch

Answers (1)

Related Questions