Steve P.
Steve P.

Reputation: 14709

Fuzzy not functioning as expected (one term search, see example)

Consider the following results from:

curl -XGET 'http://localhost:9200/megacorp/employee/_search' -d 
'{ "query" : 
     {"match":  
        {"last_name": "Smith"}
     }
  }'

Result:

{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0.30685282,
    "hits": [
      {
        "_index": "megacorp",
        "_type": "employee",
        "_id": "1",
        "_score": 0.30685282,
        "_source": {
          "first_name": "John",
          "last_name": "Smith",
          "age": 25,
          "about": "I love to go rock climbing on the weekends.",
          "interests": [
            "sports",
            "music"
          ]
        }
      },
      {
        "_index": "megacorp",
        "_type": "employee",
        "_id": "2",
        "_score": 0.30685282,
        "_source": {
          "first_name": "Jane",
          "last_name": "Smith",
          "age": 25,
          "about": "I love to go rock climbing",
          "interests": [
            "sports",
            "music"
          ]
        }
      }
    ]
  }
}

Now when I execute the following query:

curl -XGET 'http://localhost:9200/megacorp/employee/_search' -d 
'{ "query" : 
        {"fuzzy": 
             {"last_name": 
                  {"value":"Smitt", 
                   "fuzziness": 1
                  }
              }
         }
 }'

Returns NO results despite the Levenshtein distance of "Smith" and "Smitt" being 1. The same thing results with a value of "Smit." If I put in a fuzziness value of 2, I get results. What am I missing here?

Upvotes: 0

Views: 46

Answers (1)

ThomasC
ThomasC

Reputation: 8175

I assume that the last_name field your are querying is an analyzed string. The indexed term will though be smith and not Smith.

Returns NO results despite the Levenshtein distance of "Smith" and "Smitt" being 1.

The fuzzy query don't analyze term, so actually, your Levenshtein distance is not 1 but 2 :

  1. Smitt -> Smith
  2. Smith -> smith

Try using this mapping, and your query with fuzziness = 1 will work :

PUT /megacorp/employee/_mapping
{
  "employee":{
    "properties":{
      "last_name":{
        "type":"string",
        "index":"not_analyzed"
      }
    }
  }
}

Hope this helps

Upvotes: 1

Related Questions