Reputation: 21
I recently downloaded neo4j 2.1.5. I am using webadmin (i.e. the browser interface). Now I have to download a pretty big dataset into it, about 20 million records. I was able to feed in 5 mln with no problem.
However, I am not able to do so with the bigger (20 million) dataset. I use load csv command with 1000 per commit (but I have tried 5, 10, 100, 10000, 100000). I have tried many different settings (cache_type = none, weak; using os buffers or using neo4j's own), but I was able to get only "Java Heap Space" or "Failed to mark transaction as rollback only" errors. I also have tried different setting for initial and max of ram to be used. I tried to split the file into pieces of 5 mln each, but when I am trying to feed the file I get the same error (Java Heap space).
One thing I notice, however, is when I run "free -h
", the cache section starts to grow rapidly and after 2.5 Gb it throws the error. Even if I tell neo4j not to use OS buffers and caches, I am using linux ubuntu, jdk 1.8 with 64 bits, 8Gb ram on machine.
I was able to feed in 20 million records into my other machine (OS X Mavericks, jdk 1.8 with 64 bit, 4b Ram on machine). So I wonder what goes wrong with Ubuntu? Have anyone encountered this problem? I don't seem to find any similar cases on the internet. I would really appreciate if you could point at the possible solution, or give useful links.
Upvotes: 1
Views: 84
Reputation: 21
Almost forgot! I was able to solve the issue.
It turns out the problem was in the malformed input file, which contained double quotes (") at random places. An example would be a record which looks like this: name, surname, O"something, date. Neo4j assumes everything between 2 double quotes to be a single entity, even the new line symbols. So neo4j would consume millions of lines before it meets the second " symbol. Hence, while creating a node it would try to put millions of lines into a single field in a node. If it does not have enough heap space to fit in all the lines between the quotes, it will throw an error: "Java Heap space", "failed to mark transaction as rollback only", etc. If it has enough heap space it would create a huge field for a node.
If you put double quotes around each column in your csv file: "name", "surname", "O"something", "date", you still run into memory issues. As you can see in previous example commas and new lines will not take effect because they are wrapped in quotes.
I had go through file and replace all double quotes with single quote using sed command.
Upvotes: 1