Reputation: 8386
Having the folliwng book
indexed:
curl -X PUT localhost:9200/books/book/1 -d '{
"title": "All Quiet on the Western Front",
"author": "Erich Maria Remarque",
"year": 1929,
}'
I'm trying to implement a Phrase Suggester using the code of the official docs.
So I tried;
curl -XPOST 'localhost:9200/books/_search' -d '{
"suggest" : {
"text" : "al quet",
"simple_phrase" : {
"phrase" : {
"analyzer" : "body",
"field" : "bigram",
"size" : 1,
"real_word_error_likelihood" : 0.95,
"max_errors" : 0.5,
"gram_size" : 2,
"direct_generator" : [ {
"field" : "title",
"suggest_mode" : "always",
"min_word_length" : 1
} ],
"highlight": {
"pre_tag": "<em>",
"post_tag": "</em>"
}
}
}
}
}'
I'm expecting this to correct from al quet
to all quiet
.
But I get the following error:
"error" : {
"root_cause" : [ {
"type" : "illegal_argument_exception",
"reason" : "Analyzer [body] doesn't exists"
If I change "analyzer" : "body"
to "analyzer" : "title"
I get the same error but with title
:
"error" : {
"root_cause" : [ {
"type" : "illegal_argument_exception",
"reason" : "Analyzer [title] doesn't exists"
If I change "analyzer" : "body"
to "analyzer" : "default"
it doesn't show an error in that line, but it shows an error in the next line. "field" : "bigram",
"error" : {
"root_cause" : [ {
"type" : "illegal_argument_exception",
"reason" : "No mapping found for field [bigram]"
The only way to make this work is to add: "analyzer" : "default",
and "field" : "title",
:
curl -XPOST 'localhost:9200/books/_search?pretty=true' -d '{
"suggest" : {
"text" : "al quet",
"simple_phrase" : {
"phrase" : {
"analyzer" : "default",
"field" : "title",
"size" : 1,
"real_word_error_likelihood" : 0.95,
"max_errors" : 0.5,
"gram_size" : 2,
"direct_generator" : [ {
"field" : "title",
"suggest_mode" : "always",
"min_word_length" : 1
} ],
"highlight": {
"pre_tag": "<em>",
"post_tag": "</em>"
}
}
}
}
}'
With this I'm getting this output:
"suggest" : {
"simple_phrase" : [ {
"text" : "al quet",
"offset" : 0,
"length" : 7,
"options" : [ {
"text" : "al quiet",
"highlighted" : "al <em>quiet</em>",
"score" : 0.09049256
} ]
} ]
}
As you can see it's correcting quiet
but not al
, With all my other tries is happening the same, it only corrects one word.
How can I make a succesfull phrase suggester that in the example you input al quet
and returns all quiet
?
Upvotes: 2
Views: 1678
Reputation: 12672
You got the first error because there is no analyzer
named body in your index and same with title
The second error is due to missing field bigram, you have only three fields in your index namely title, author, and year.
With your current setup, for suggester
to work correctly, you need to give high value for max_errors
. From the docs, max_errors is
the maximum percentage of the terms that at most considered to be misspellings in order to form a correction. This method accepts a float value in the range [0..1) as a fraction of the actual query terms or a number >=1 as an absolute number of query terms. The default is set to 1.0 which corresponds to that only corrections with at most 1 misspelled term are returned. Note that setting this too high can negatively impact performance. Low values like 1 or 2 are recommended otherwise the time spend in suggest calls might exceed the time spend in query execution.
so this should give you desired output.
{
"suggest": {
"text": "al quet",
"simple_phrase": {
"phrase": {
"analyzer": "default",
"field": "title",
"size": 1,
"real_word_error_likelihood": 0.95,
"max_errors": 0.9, <--- increase this value
"gram_size": 2,
"direct_generator": [
{
"field": "title",
"suggest_mode": "always",
"min_word_length": 1
}
],
"highlight": {
"pre_tag": "<em>",
"post_tag": "</em>"
}
}
}
},
"size": 0
}
You might want to use shingles
for phrases and collate
to get only those results which are in index. I have given detailed answer for this question which might help.
Upvotes: 3