Martin
Martin

Reputation: 43

Bulkimport / arangoimp

Eventually I've to load 35GB of data in an aragnodb instance.
So far I've tried those approaches to load only 5GB (and failed):

Could someone tell me how to fix does commands, or a way to actually load those data?

Thanks

Edit for @DavidThomas, here comes the specs:
- RAM: 128G
- CPU: 2x Intel(R) Xeon(R) CPU E5-2420 0 @ 1.90GHz
- OS: Linux (ubuntu) sneezy 3.13.0-86-generic
- HDD: classic (non SSD)

Upvotes: 1

Views: 652

Answers (1)

dothebart
dothebart

Reputation: 6067

I hope you're not using ArangoDB 2.4 as in your link to ArangoImp? ;-)

For our Performance Blogpost series we imported the pokec dataset using arangoimp. The Maximum POST body size of the server is 512MB.

For peformance reasons, arangoimp doesn't parse the json, but rather leans on one line of your import file being one document to send, so it can easily chop it into bits of valid json.

It therefore can't handle chunking in json dumps like this:

[
{ "name" : { "first" : "John", "last" : "Connor" }, "active" : true, "age" : 25, "likes" : [ "swimming"] },
{ "name" : { "first" : "Lisa", "last" : "Jones" }, "dob" : "1981-04-09", "likes" : [ "running" ] }
]

and thus will attempt to send the whole file at once; if that exceeds your specified batch-size, you will get the import file is too big errormessage.

However, if your file contains one document per line:

{ "name" : { "first" : "John", "last" : "Connor" }, "active" : true, "age" : 25, "likes" : [ "swimming"] }
{ "name" : { "first" : "Lisa", "last" : "Jones" }, "dob" : "1981-04-09", "likes" : [ "running" ] }

it can handle chunking per line along the --batch-size down to a minimum size of 32kb.

you therefore need to prepare your dump along the guidlines above in order to use arangoimp.

Since arangoimp also uses the import API, it has the same limitations as using it raw. You need to write a tiny programm using a stream enabled json parser and translate the output to be one document per line. You may then directly send chunks to the server in your script, or use arangoimp to handle the chunking for you.

Upvotes: 1

Related Questions