Reputation: 11
I need to use machine learning algorithms in order to sort / rank query results. Our queries are running on elasticsearch. For that, we need to combine data from the document itself, from the explanation section (although the explanation should not be returned) and from external sources.
This is pretty heavy computation, so I don't want to run the ranking algorithms on all documents, but only on top 1000, and return my top 100.
Creating a scoring plugin will run on all documents; I didn't see any option to create plugin for the rescoring phase. So, it seems like I must create a sorting plugin.
My question is - how many documents are running through the sorting phase? Is there any way to control it (like window_size in rescore)? What happens if I have pagination - does my sorting runs again? Is it possible to get 1000 docs with the explanation section into the sorting phase and return only 100 without the explanation?
Thanks!
Upvotes: 1
Views: 715
Reputation: 206
-This is pretty heavy computation, so I don't want to run the ranking algorithms on all documents, but only on top 1000, and return my top 100.
use rescoring combined with your scoring plugin, rescoring algo runs only on top N results
-how many documents are running through the sorting phase?
all which match your query, if you have asked for N docs , each shard sends top N and then they are merged together
-What happens if I have pagination - does my sorting runs again?
yes , sorting runs again and worse if you asked for documents fro 100000 to 100010 , sorting happens for 100010 docs per shard
Upvotes: 1