Reputation: 6822
My database was affected by the bug in Neo4j 2.1.1 that tends to corrupt the database in the areas where many nodes have been deleted. It turns out most of the relationships that have been affected were marked for deletion in my database. I have dump
ed the rest of the data using neo4j-shell
and with a single query. This gives a 1.5G Cypher file that I need to import into a mint database to have my data back in a healthy data structure.
I have noticed that the dump file contains definitions for (1) schema, (2) nodes and (3) relationships. I have already removed the schema definitions from the file because they can be applied later on. Now the issue is that since the dump file uses a single series of identifiers for nodes during node creation (in the following format: _nodeid) and relationship creation, it seems that all CREATE
statements (33,160,527 in my case) need to be run in a single transaction.
My first attempt to do so kept the server busy for 36 hours without results. I had neo4j-shell
load the data directly into a new database directory instead of connecting to a server. The data files in the new database directory never showed any sign of receiving data, and the message log showed many messages indicating thread blocks.
I wonder what is the best way of getting this data back into the database? Should I load a specific config file? Do I need to allocate a large Java heap? What is the trick to have such a large dump file loaded into a database?
Upvotes: 5
Views: 2427
Reputation: 6822
Here is what I finally did:
First I identified all unaffected nodes and marked them with one specific label (let's say Carriable). This was a pretty easy process in my case because all the affected nodes had the same label, so, I just excluded this specific label. In my case I did not have to identify the affected relationships separately because all the affected relationships were also connected to nodes from the affected label.
Then I exported the whole database except the affected nodes and relationships to GraphML using a single query (in neo4j-shell
):
export-graphml -o /home/mah/full.gml -t -r match (n:Carriable) optional match (n)-[i]-(:Carriable) return n,i
This took about a half hour to yield a 4GB XML file.
Then I imported the entire GraphML back into a mint database:
JAVA_OPTS="-Xmx8G" neo4j-shell -c "import-graphml -c -t -b 10000 -i /home/mah/full.gml" -path /db/newneo
This took yet another half hour to accomplish.
Please note that I allocated more than sufficient Java heap memory (JAVA_OPTS="-Xmx8G"
), imposed a particularly small batch size (-b 10000
) and allowed the use of on-disk caching.
Finally, I removed the unnecessary "Carriable" label and recreated the constraints.
Upvotes: 2
Reputation: 41686
The dump command is not meant for larger scale exports, there was originally a version that did, but it was not included in the product.
if you have the old database still around, you can try some things:
load csv
command or the batch-importerUpvotes: 4