Reputation: 110083
I am looking to sync an Elasticsearch index to another data source. To get the database data I can do:
select _id, md5 FROM history
What would be the fastest way to do this in ES? I've tried using the scroll API, but it seems deathly slow, having a 10k limit:
es.search(index='history', _source=['_id', 'md5'], size=10000))
Is there a better way to do this?
Upvotes: 1
Views: 101
Reputation: 6066
Scroll API can be used in parallel fashion via using slices. Theoretically it will allow to have speed up N times with N slices.
The slowness of scroll is due to the fact that Elasticsearch needs to perform a full scan.
For syncing of Elasticsearch and other DBs I would recommend to have some document queue before Elasticsearch that would send documents to Elasticsearch and to other components. An example of such document queue would be Apache Kafka. There is no mechanism (as of my knowledge) that would allow sending new document updates from Elasticsearch to a third party component.
Hope that helps!
Upvotes: 1