retrography
retrography

Reputation: 6822

Loading Neo4j database dump (neo4j-shell)

My database was affected by the bug in Neo4j 2.1.1 that tends to corrupt the database in the areas where many nodes have been deleted. It turns out most of the relationships that have been affected were marked for deletion in my database. I have dumped the rest of the data using neo4j-shell and with a single query. This gives a 1.5G Cypher file that I need to import into a mint database to have my data back in a healthy data structure.

I have noticed that the dump file contains definitions for (1) schema, (2) nodes and (3) relationships. I have already removed the schema definitions from the file because they can be applied later on. Now the issue is that since the dump file uses a single series of identifiers for nodes during node creation (in the following format: _nodeid) and relationship creation, it seems that all CREATE statements (33,160,527 in my case) need to be run in a single transaction.

My first attempt to do so kept the server busy for 36 hours without results. I had neo4j-shell load the data directly into a new database directory instead of connecting to a server. The data files in the new database directory never showed any sign of receiving data, and the message log showed many messages indicating thread blocks.

I wonder what is the best way of getting this data back into the database? Should I load a specific config file? Do I need to allocate a large Java heap? What is the trick to have such a large dump file loaded into a database?

Upvotes: 5

Views: 2427

Answers (2)

retrography
retrography

Reputation: 6822

Here is what I finally did:

First I identified all unaffected nodes and marked them with one specific label (let's say Carriable). This was a pretty easy process in my case because all the affected nodes had the same label, so, I just excluded this specific label. In my case I did not have to identify the affected relationships separately because all the affected relationships were also connected to nodes from the affected label.

Then I exported the whole database except the affected nodes and relationships to GraphML using a single query (in neo4j-shell):

export-graphml -o /home/mah/full.gml -t -r  match (n:Carriable) optional match (n)-[i]-(:Carriable) return n,i

This took about a half hour to yield a 4GB XML file.

Then I imported the entire GraphML back into a mint database:

JAVA_OPTS="-Xmx8G" neo4j-shell -c "import-graphml -c -t -b 10000 -i /home/mah/full.gml" -path /db/newneo

This took yet another half hour to accomplish.

Please note that I allocated more than sufficient Java heap memory (JAVA_OPTS="-Xmx8G"), imposed a particularly small batch size (-b 10000) and allowed the use of on-disk caching.

Finally, I removed the unnecessary "Carriable" label and recreated the constraints.

Upvotes: 2

Michael Hunger
Michael Hunger

Reputation: 41686

The dump command is not meant for larger scale exports, there was originally a version that did, but it was not included in the product.

if you have the old database still around, you can try some things:

  • contact Neo4j support to help you recover your data
  • use my store-utils to copy it over to a new db (it will skip all broken records)
  • query the data with cypher and export the results as csv
    • you could use the shell-import-tools for that
    • and then import your data from the CSV using either the shell tools again, or the load csv command or the batch-importer

Upvotes: 4

Related Questions