TheJediCowboy
TheJediCowboy

Reputation: 9232

Why is ElasticSearch match query returning all results?

I have the following ElasticSearch query which I would think would return all matches on the email field where it equals [email protected]

"query": {
  "bool": {
    "must": [
      {
        "match": {
          "email": "[email protected]"
      }
    }
  ]
}

}

The mapping for the user type that is being searched is the following:

    {
      "users": {
      "mappings": {
         "user": {
            "properties": {
               "email": {
                  "type": "string"
               },
               "name": {
                  "type": "string",
                  "fields": {
                     "raw": {
                        "type": "string",
                        "index": "not_analyzed"
                     }
                  }
               },
               "nickname": {
                  "type": "string"
               },
            }
         }
       }
   }  
     }

The following is a sample of results returned from ElasticSearch

 [{
    "_index": "users",
    "_type": "user",
    "_id": "54b19c417dcc4fe40d728e2c",
    "_score": 0.23983537,
    "_source": {
    "email": "[email protected]",
    "name": "John Smith",
    "nickname": "jsmith",
 },
 {
    "_index": "users",
    "_type": "user",
    "_id": "9c417dcc4fe40d728e2c54b1",
    "_score": 0.23983537,
    "_source": {
       "email": "[email protected]",
       "name": "Walter White",
       "nickname": "wwhite",
 },
 {
    "_index": "users",
    "_type": "user",
    "_id": "4fe40d728e2c54b19c417dcc",
    "_score": 0.23983537,
    "_source": {
       "email": "[email protected]",
       "name": "Jimmy Fallon",
       "nickname": "jfallon",
}]

From the above query, I would think this would need to have an exact match with '[email protected]' as the email property value.

How does the ElasticSearch DSL query need to change in order to only return exact matches on email.

Upvotes: 7

Views: 3551

Answers (1)

Vineeth Mohan
Vineeth Mohan

Reputation: 19273

The email field got tokenized , which is the reason for this anomaly. So what happened is when you indexed

"[email protected]" => [ "myemail" , "gmail.com" ]

This way if you search for myemail OR gmail.com you will get the match right. SO what happens is , when you search for [email protected] , the analyzer is also applied on search query. Hence its gets broken into

"[email protected]" => [ "john" , "gmail.com" ]

here as "gmail.com" token is common in search term and indexed term , you will get a match.

To over ride this behavior , declare the email; field as not_analyzed. There by the tokenization wont happen and the entire string will get indexed as such.

With "not_analyzed"

"[email protected]" => [ "[email protected]" ]

So modify the mapping to this and you should be good -

{
  "users": {
    "mappings": {
      "user": {
        "properties": {
          "email": {
            "type": "string",
            "index": "not_analyzed"
          },
          "name": {
            "type": "string",
            "fields": {
              "raw": {
                "type": "string",
                "index": "not_analyzed"
              }
            }
          },
          "nickname": {
            "type": "string"
          }
        }
      }
    }
  }
}

I have described the problem more precisely and another approach to solve it here.

Upvotes: 12

Related Questions