Reputation: 9232
I have the following ElasticSearch query which I would think would return all matches on the email field where it equals [email protected]
"query": {
"bool": {
"must": [
{
"match": {
"email": "[email protected]"
}
}
]
}
}
The mapping for the user type that is being searched is the following:
{
"users": {
"mappings": {
"user": {
"properties": {
"email": {
"type": "string"
},
"name": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
},
"nickname": {
"type": "string"
},
}
}
}
}
}
The following is a sample of results returned from ElasticSearch
[{
"_index": "users",
"_type": "user",
"_id": "54b19c417dcc4fe40d728e2c",
"_score": 0.23983537,
"_source": {
"email": "[email protected]",
"name": "John Smith",
"nickname": "jsmith",
},
{
"_index": "users",
"_type": "user",
"_id": "9c417dcc4fe40d728e2c54b1",
"_score": 0.23983537,
"_source": {
"email": "[email protected]",
"name": "Walter White",
"nickname": "wwhite",
},
{
"_index": "users",
"_type": "user",
"_id": "4fe40d728e2c54b19c417dcc",
"_score": 0.23983537,
"_source": {
"email": "[email protected]",
"name": "Jimmy Fallon",
"nickname": "jfallon",
}]
From the above query, I would think this would need to have an exact match with '[email protected]' as the email property value.
How does the ElasticSearch DSL query need to change in order to only return exact matches on email.
Upvotes: 7
Views: 3551
Reputation: 19273
The email field got tokenized , which is the reason for this anomaly. So what happened is when you indexed
"[email protected]" => [ "myemail" , "gmail.com" ]
This way if you search for myemail OR gmail.com you will get the match right. SO what happens is , when you search for [email protected] , the analyzer is also applied on search query. Hence its gets broken into
"[email protected]" => [ "john" , "gmail.com" ]
here as "gmail.com" token is common in search term and indexed term , you will get a match.
To over ride this behavior , declare the email; field as not_analyzed. There by the tokenization wont happen and the entire string will get indexed as such.
With "not_analyzed"
"[email protected]" => [ "[email protected]" ]
So modify the mapping to this and you should be good -
{
"users": {
"mappings": {
"user": {
"properties": {
"email": {
"type": "string",
"index": "not_analyzed"
},
"name": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
},
"nickname": {
"type": "string"
}
}
}
}
}
}
I have described the problem more precisely and another approach to solve it here.
Upvotes: 12