Reputation: 1565
I am currently trying to solve a problem statement. We have an index where I want to fetch all documents at once and apply an update on few fields. This will require pagination across documents during fetch process.
Context - The architecture is like this: a daily cronjob which will write on all documents (full update) once in a day. Let's say it takes 2 hours to complete. Now, in between this time we have multiple Kafka Consumers which perform writes on the same index (full update) with some different data. We do full update since our schema uses nested fields.
To solve the problem statement as I mentioned above, we can use Scroll API to fetch all documents, apply update and do bulk indexing. However, I want to take care of 2 things -
_primary_term
and _seq_no
is not given by Scroll API response. This is where PIT with search_after
comes useful.search_after
, I want to understand what will happen when snapshot of the index is taken and kept it alive for "1m" for each paginated request and the other writes/deletes from Kafka consumers happening simultaneously - how things will work here? Because I'm not sure if I end up applying full update to the document with stale data. I always want to keep the latest data.Would be great if community can help me with better clarity on the usage and whether this approach would be good or not.
Thanks, Harshit
Upvotes: 0
Views: 21