Reputation: 14859
When i trying to insert new data to ElasticSearch i'm hitting the wall in Python.
The connection between ElasticSearch and Python script will be running on the same server, but when i running it its like its trying to index my reqeust before its return info to me.
its mean i taking long time to handle like 1000 request or somting like that, so what i think are.
will it be posibule to insert and not index before i'm done with the script and then run a re-index?
my pyhton code look like this.
es = Elasticsearch([{'host': str(config['elastic']['host']), 'port': str(config['elastic']['port'])}])
res = es.index(index="test-index", doc_type='products', id=product['uuid'], body=data)
print(res['created'])
if its can help, then i got only around 200-250.000 documents in the database, thats why i can't understand why its so slow to insert, and very fast to get from a index.
Final code sample - how to use bulk
es = Elasticsearch([{'host': str(config['elastic']['host']), 'port': str(config['elastic']['port'])}])
data = {'field':'value'}
bulk = ""
bulk = bulk + '{"_op_type": "index", "_index": "index-name", "_type": "doc-type", "_id": "id-need-effect", "doc" : "'+ json.dumps(data) +'"}\n'
bulk = bulk + '{"_op_type": "delete", "_index": "index-name", "_type": "doc-type", "_id": "id-want-to-delete"}\n'
es.bulk( body=bulk )
remeber new line every time you add new ( \n ) and in this case you will not longer hit the perfomces issue i hitting before.
Upvotes: 1
Views: 651
Reputation: 9320
I think, that usage of Bulk API
could help you, so instead of sending 1 doc -> indexing it -> commiting, you will do sending 1000 docs -> indexing -> commiting, which should be much faster.
Taken from official Elastic documentation:
The bulk API makes it possible to perform many index/delete operations in a single API call. This can greatly increase the indexing speed.
For more information how to use it in Python - http://elasticsearch-py.readthedocs.io/en/master/helpers.html
Upvotes: 1