Reputation: 451
We have an elasticsearch index with the following configuration:
PUT phonebook
{
"settings":{
"index":{
"number_of_shards":8,
"number_of_replicas":1
}
},
"mappings":{
"person":{
"_all":{
"enabled":false
},
"_source":{
"enabled":true
},
"properties":{
"id":{
"type":"long"
},
"name":{
"type":"text",
"index_options":"positions"
},
"number":{
"type":"long"
}
}
}
}
}
It's basically a huge phonebook with billions of records. I'm searching on this index with the following query:
GET /contacts/contact/_search
{
"size":0,
"query":{
"match":{
"name":{
"fuzziness":1,
"query":"george bush",
"operator":"and"
}
}
},
"aggs":{
"by_number":{
"terms":{
"field":"number",
"size":10,
"order":{
"max_score":"desc"
}
},
"aggs":{
"max_score":{
"max":{
"script":"_score"
}
},
"sample":{
"top_hits":{
"size":1
}
}
}
}
}
}
The results are grouped by the field "number" and best match for each number is returned this way. But what I need is a custom scoring/sorting of the results based on the correctness of the order of words in the results. So that "George Bush" should always score better than "Bush George" for the query of "George Bush". match_phrase search is not suitable for me as I use fuzziness on my search.
Upvotes: 1
Views: 91
Reputation: 1166
How about something like this:
"query":{
"simple_query_string": {
"query": "\"barack~ obama~\"~3",
"fields": ["name"]
}
},
The trailing ~
following the tokens are for the fuzzy aspect and the ~3
following the phrase handles slop which is the concept that I think you are looking for with phrase queries. I think the results will be scored such that "Barack Obama" is scored higher than "Obama Barack" with this. You can come up with a custom bool
query that mimics this where the should clause handles both the fuzziness and the slop aspects.
Some resources:
Upvotes: 1