nKognito
nKognito

Reputation: 6363

Special characters within indexed fields

I facing some interesting behavior while searching email addresses with Query String Filter:

.filteredQuery(
   queryStringQuery(String.format("*%s*", query))
       .field("firstName").field("lastName").field("email").field("phone"),
   null
)

if I pass domain.com as a query (assuming there is such value in the index) - results are fine, but once I pass @domain.com - results are empty.. Are there some limitations for special symbols?

Upvotes: 2

Views: 934

Answers (1)

moliware
moliware

Reputation: 10278

If you set to true analyze_wildcard it should work. By default, query string doesn't analyze those tokens that contain wildcard. if you set that option to true elasticsearch will try. This option is not perfect as doc says:

By setting this value to true, a best effort will be made to analyze those as well.

The reason behind your empty result is that the default analyzer is removing the @ and when searching *@domain.com* and analyze_wildcard is false, the @ is not being removed at query time.

Code will look like:

.filteredQuery(
    queryStringQuery(String.format("*%s*", query)).analyzeWildcard(true)
        .field("firstName").field("lastName").field("email").field("phone"),
    null
)

EDIT: Better explanation of why you get empty result.

First of all, analyzers can be executed at index(you set this in your mapping) time and at query time (not all query execute the analyzer at query time)

In your case, at index time standard analyzer is analyzing field email as follows:

[email protected] => it's being indexed name and domain.com

This means that your document will contain two tokens name and domain.com. If you tried to find exact term "[email protected]" you wouldn't find anything because your document no longer contains the full email.

Now at query time you are doing a query string *@domain.com*. By default query string doesn't analyze those tokens that contain wildcards, so you are trying to find tokens that contain @domain.com that it not the case of your index.

Now if you set property analyze_wildcard to true. Elasticsearch analyzes those tokens with wildcard so your query would be transformed into *domain.com* and in this case you have documents that match.

Upvotes: 1

Related Questions