BatScream
BatScream

Reputation: 19700

Store large binary files as attachments in Elasticsearch 1.3.2

I need to store a large PDF(120 MB) on to elastic search.

I ran the below scripts through cygwin:

$ curl -XPUT 'localhost:9200/samplepdfs/' -d '{
  "settings": {
    "index": {
      "number_of_ shards": 1,
      "number_of_replicas": 0
    }
  }
}'

{
  "acknowledged": true
}

$ coded=`cat sample.pdf | perl -MMIME::Base64 -ne 'print encode_base64($_)'`

$ json="{\"file\":\"${coded}\"}"

$ echo $json > json.file

$ curl -XPOST 'localhost:9200/samplepdfs/attachment/1' -d @json.file

and the server threw an out of memory Exception.

at org.elasticsearch.common.netty.handler.codec.http.HttpChunkAggregator .appendToCumulation(HttpChunkAggregator.java:208)

Kindly suggest a solution/configuration change to resolve the issue.

Upvotes: 2

Views: 3689

Answers (1)

progrrammer
progrrammer

Reputation: 4489

The error is easy to understand, you are doing large job in small machine. So, By configuration I guess you have single machine with allocating 512 MB of RAM or 2Gigs .

2 Gigs of RAM is not sufficient for your document.

So, What's the solution?

  1. Buy more RAM and associate 8 Gigs or more RAM to elasticsearch
  2. Use more machines (so, you have to split your index for atleast 5 shards)
  3. If you can then break your file to small parts (I guess it is not possible for pdf file you are trying to index)

References

http://elasticsearch-users.115913.n3.nabble.com/How-to-index-text-file-having-size-more-than-the-system-memory-td4028184.html

Hope this solves the problem, Thanks

Upvotes: 3

Related Questions