ggggggggggEEE
ggggggggggEEE

Reputation: 71

How to speed up Elasticsearch scroll in python

I need to get data for a certain period of time by es api and use python to do some customized analysis of these data and display the result on dashboard.

There are about two hundred thousand records every 15 minutes,indexed by date.

Now I use scroll-scan to get data,But it takes nearly a minute to get 200000 records,It seems to be too slow.

Is there any way to process these data more quickly?and can I use something like redis to save the results and avoid repetitive work?

Upvotes: 1

Views: 745

Answers (1)

James Daily
James Daily

Reputation: 146

Is it possible to do the analysis on the Elasticsearch side using aggregations?

Assuming you're not doing it already, you should use _source to only download the absolute minimum data required. You could also try increasing the size parameter to scan() from the default of 1000. I would expect only modest speed improvements from that, however.

If the historical data doesn't change, then a cache like Redis (or even just a local file) could be a good solution. If the historical data can change, then you'd have to manage cache invalidation.

Upvotes: 1

Related Questions