Reputation: 1379
Running a wagtail site (1.11) with elasticsearch (5.5) as search backend and indexing multiple fields, e.g.:
search_fields = Page.search_fields + [
index.SearchField('body'),
index.SearchField('get_post_type_display'),
index.SearchField('document_excerpt', boost=2),
index.SearchField('get_dark_data_full_text'),
]
I would like to indicate in which field the search lands a 'hit' in my search results template (or even better display a snippet of the hit, but that seems to be another question).
This question seem to address my issue, but I don't know how to integrate this in my wagtail site.
Any tips how to get this information and how to integrate this in wagtail search?
Upvotes: 2
Views: 385
Reputation: 14621
ElasticSearch has the Explain API which can explain how it internally scores hits by field for a specific record with a specific id.
Here is the documentation:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-explain.html
It definitely gives you an answer on how each field was boosted and how the score was built.
For example, if your hits max_score was 2.0588222 and you want to know how which fields contributed to this score, you can use the explain API.
This is an example of an explain query response where you see that field firstName contributed 1.2321436 to the max score and lastName contributed 0.8266786:
{
"_index" : "customer_test",
"_type" : "customer",
"_id" : "597f2b3a79c404fafefcd46e",
"matched" : true,
"explanation" : {
"value" : **2.0588222**,
"description" : "sum of:",
"details" : [ {
"value" : 2.0588222,
"description" : "sum of:",
"details" : [ {
"value" : **1.2321436**,
"description" : "weight(firstName:merge in 23) [PerFieldSimilarity], result of:",
"details" : [ {
"value" : 1.2321436,
"description" : "score(doc=23,freq=1.0 = termFreq=1.0\n), product of:",
"details" : [ {
"value" : 1.2321436,
"description" : "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
"details" : [ {
"value" : 3.0,
"description" : "docFreq",
"details" : [ ]
}, {
"value" : 11.0,
"description" : "docCount",
"details" : [ ]
} ]
}, {
"value" : 1.0,
"description" : "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
"details" : [ {
"value" : 1.0,
"description" : "termFreq=1.0",
"details" : [ ]
}, {
"value" : 1.2,
"description" : "parameter k1",
"details" : [ ]
}, {
"value" : 0.75,
"description" : "parameter b",
"details" : [ ]
}, {
"value" : 1.0,
"description" : "avgFieldLength",
"details" : [ ]
}, {
"value" : 1.0,
"description" : "fieldLength",
"details" : [ ]
} ]
} ]
} ]
}, {
"value" : 0.8266786,
"description" : "weight(lastName:doe in 23) [PerFieldSimilarity], result of:",
"details" : [ {
"value" : 0.8266786,
"description" : "score(doc=23,freq=1.0 = termFreq=1.0\n), product of:",
"details" : [ {
"value" : **0.8266786**,
"description" : "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
"details" : [ {
"value" : 3.0,
"description" : "docFreq",
"details" : [ ]
}, {
"value" : 7.0,
"description" : "docCount",
"details" : [ ]
} ]
}, {
"value" : 1.0,
"description" : "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
"details" : [ {
"value" : 1.0,
"description" : "termFreq=1.0",
"details" : [ ]
}, {
"value" : 1.2,
"description" : "parameter k1",
"details" : [ ]
}, {
"value" : 0.75,
"description" : "parameter b",
"details" : [ ]
}, {
"value" : 1.0,
"description" : "avgFieldLength",
"details" : [ ]
}, {
"value" : 1.0,
"description" : "fieldLength",
"details" : [ ]
} ]
} ]
} ]
} ]
}, {
"value" : 0.0,
"description" : "match on required clause, product of:",
"details" : [ {
"value" : 0.0,
"description" : "# clause",
"details" : [ ]
}, {
"value" : 1.0,
"description" : "_type:customer, product of:",
"details" : [ {
"value" : 1.0,
"description" : "boost",
"details" : [ ]
}, {
"value" : 1.0,
"description" : "queryNorm",
"details" : [ ]
} ]
} ]
} ]
}
}
About wagtail: I have no experience with it. But you can definitely access the REST API and parse the JSON of an Explain query.
Upvotes: 3