Reputation: 43
Eventually I've to load 35GB of data in an aragnodb instance.
So far I've tried those approaches to load only 5GB (and failed):
Loading via gremlin. It worked, but it took something like 3 days; this is not an option.
bulkimport features an import?
API endpoint but I got the following error:
...[1] WARNING maximal body size is 536870912, request body size is -2032123904
arangodbimp command but I ended up with two different errors:
--batch-size
it firesimport file is too big. please increase the value of --batch-size
--batch-size
it returns the same error as the bulkimport.Could someone tell me how to fix does commands, or a way to actually load those data?
Thanks
Edit for @DavidThomas, here comes the specs:
- RAM: 128G
- CPU: 2x Intel(R) Xeon(R) CPU E5-2420 0 @ 1.90GHz
- OS: Linux (ubuntu) sneezy 3.13.0-86-generic
- HDD: classic (non SSD)
Upvotes: 1
Views: 652
Reputation: 6067
I hope you're not using ArangoDB 2.4 as in your link to ArangoImp? ;-)
For our Performance Blogpost series we imported the pokec dataset using arangoimp. The Maximum POST body size of the server is 512MB.
For peformance reasons, arangoimp doesn't parse the json, but rather leans on one line of your import file being one document to send, so it can easily chop it into bits of valid json.
It therefore can't handle chunking in json dumps like this:
[
{ "name" : { "first" : "John", "last" : "Connor" }, "active" : true, "age" : 25, "likes" : [ "swimming"] },
{ "name" : { "first" : "Lisa", "last" : "Jones" }, "dob" : "1981-04-09", "likes" : [ "running" ] }
]
and thus will attempt to send the whole file at once; if that exceeds your specified batch-size, you will get the import file is too big
errormessage.
However, if your file contains one document per line:
{ "name" : { "first" : "John", "last" : "Connor" }, "active" : true, "age" : 25, "likes" : [ "swimming"] }
{ "name" : { "first" : "Lisa", "last" : "Jones" }, "dob" : "1981-04-09", "likes" : [ "running" ] }
it can handle chunking per line along the --batch-size
down to a minimum size of 32kb.
you therefore need to prepare your dump along the guidlines above in order to use arangoimp.
Since arangoimp also uses the import API, it has the same limitations as using it raw. You need to write a tiny programm using a stream enabled json parser and translate the output to be one document per line. You may then directly send chunks to the server in your script, or use arangoimp to handle the chunking for you.
Upvotes: 1