Reputation: 584
I have an m1.large Neo4j Server instance on Amazon EC2 which I setup according to the instructions here: http://www.neo4j.org/develop/ec2
I did not vary from that setup in any way.
It has been working mostly flawlessly for several weeks, with the occasional restart. However, I was unable to connect to it from my web app this morning (3/13/2013).
Restart via the Neo4j Server instance from the EC2 management console. After it rebooted my web app seemed to be able to make the initial connection (via Neoid & Neography) so my web app would at least boot.
However all transactions were then failing. I tried accessing the admin console for Neo4j on port 7474 at webadmin/, and this is the error I see, in particular:
javax.transaction.SystemException: TM has encountered some problem, please perform neccesary action (tx recovery/restart)
Restarting is not the solution.
Full error trace for Attempt #1 when accessing web panel: GIST #1.
I found a thread that referenced a seemingly related problem, which indicated that starting neo4j in console mode will allow for a full recovery without timing out, so I tried it, with SEVERE
results, shedding more light on my problem:
It looks like this is the root cause:
Caused by: java.io.IOException: Unknown xid for identifier 8964
Full error trace for Attempt #2 running sudo /var/lib/neo4j/bin/neo4j console
: GIST #2
This is pre-production data, so I have the luxury of drastic measures. I deleted the database and started over.
sudo rm -rf /var/lib/neo4j/data/graph.db/
sudo sudo /var/lib/neo4j/bin/neo4j start
I was able to create about 50k rels & 50k nodes, and then the errors came back after, at most, a few hours.
I stopped the Neo4j server, and loaded it as console - to do recovery.
Full trace of Attempt #4 recovery: running sudo /var/lib/neo4j/bin/neo4j console
: GIST #3
Recovery worked so I restarted server as daemon.
Full trace of Attempt #4 starting daemon: running sudo /var/lib/neo4j/bin/neo4j start
: GIST #4
It worked for a few minutes. And then back to this error again:
TM has encountered some problem, please perform neccesary action (tx recovery/restart)
Full trace of new error as seen from Neography's attempt to execute a script: GIST #5
I now think that despite having used the vanilla Neo4j installed as part of following this guide for an m1.large server, there are some problems with this configuration. When I start the server in console mode these are the bothersome things I see:
INFO ... Could NOT find resource [logback.groovy]
INFO ... Could NOT find resource [logback-test.xml]
ERROR ... Could not find resource corresponding to [custom-logback.xml]
And this one
WARNING! Deprecated configuration options used. See manual for details
cannot configure writers and searchers individually since they go together
Update: I have filed a separate issue for these default configuration problems.
Upvotes: 1
Views: 707
Reputation: 584
The problem was the root device running out of space.
I have resolved my problems so here is a full history and explanation of the fix: http://github.com/neo4j-contrib/neo4j-puppet/issues/3
Upvotes: 1