Taimoor Khan
Taimoor Khan

Reputation: 615

Elasticsearch data binary ran out of memory

Im trying to upload a 800GB file to elasticsearch but i keep getting a memory error that tells me the data binary is out of memory. I have 64GB of RAM on my system and 3TB of storage

curl -XPOST 'http://localhost:9200/carrier/doc/1/_bulk' --data-binary @carrier.json

Im wondering if there is a setting in the config file to increase to amount of memory so i can upload to his file

thanks

Upvotes: 4

Views: 2508

Answers (1)

Val
Val

Reputation: 217344

800GB is a quite a lot to send in one shot, ES has to put all the content into memory in order to process it, so that's probably too big for the amount of memory you have.

One way around this is to split your file into several and send each one after another. You can achieve it with a small shell script like the one below.

#!/bin/sh

# split the main file into files containing 10,000 lines max
split -l 10000 -a 10 carrier.json /tmp/carrier_bulk

# send each split file
BULK_FILES=/tmp/carrier_bulk*
for f in $BULK_FILES; do
    curl -s -XPOST http://localhost:9200/_bulk --data-binary @$f
done

UPDATE

If you want to interpret the ES response you can do so easily by piping the response to a small python one-liner like this:

curl -s -XPOST $ES_HOST/_bulk --data-binary @$f | python -c 'import json,sys;obj=json.load(sys.stdin);print "    <- Took %s ms with errors: %s" % (obj["took"], obj["errors"])';

Upvotes: 8

Related Questions