Reputation: 1231
I have my mapping like this:
{
"doc": {
"mappings": {
"mydocument": {
"properties": {
"file": {
"type": "attachment",
"path": "full",
"fields": {
"file": {
"type": "string",
"store": true,
"term_vector": "with_positions_offsets"
},
"author": {
...
When I search for a complete word I get the result:
"query": {
"fuzzy_like_this" : {
"fields" : ["file"],
"like_text" : "This_is_something_I_want_to_search_for",
"max_query_terms" : 12
}
},
"highlight" : {
"number_of_fragments" : 3,
"fragment_size" : 650,
"fields" : {
"file" : { }
}
}
But if I change the search term to "This_is_something_I_want"
I get nothing. What am I missing?
Upvotes: 1
Views: 154
Reputation: 1518
To implement a partial match, we must first understand what fuzzy like this
does and then decide what you want partial matching to return. fuzzy like this
will perform 2 key functions.
like_text
will be analyzed using the default analyzer. All the resulting tokens will then be used to find documents based on term frequency, or tf-idf
This typically means that the input term will be be split on space and lowercased. This_is_something_I_want
will therefore be tokenized to this_is_something_i_want
. Unless you have files with this exact term, no documents will match.
fuzzified
. Fuzzy searches score terms based on how many character changes needs to made to a word to match another word. For instance to get from bat
to hat
we will need to make 1 character change.For our case to get from this_is_something_i_want
to this_is_something_i_want_to_search_for
, we will need to make 14 character changes (adding _to_search_for
.) Standard fuzzy search only allows for 3 character changes when working with terms longer that 5 or 6 characters. Increasing the fuzzy limit to 14 will however produce severely skewed results
So neither of these functions will help produce the results you seek.
Here is what I can suggest:
You can implement an analyzer that splits on underscore similar to this. Tokens produced will then be ['this', 'is', 'something', 'i', 'want']
which can correctly be matched to to the sample case
Alternatively, if all you want is a document that starts with the specified text, you can use a phrase prefix
query instead of fuzzy like this
. Documentations here
Upvotes: 1