Kevin
Kevin

Reputation: 775

Neo4J Traversal Running out of memory

I have a neo4j database containing some 120 million nodes. I am using the traversal framework to traverse through my graph and count the occurence of certain nodes. This works like a charm. Unfortunately, when running my code on my entire dataset I run out of memory.

I already allocated 4gb to the Java VM, I think I commit my transcations (using the tx.success in the try-with-resources statement), but still I fill my heap quite fast.

Below you can find my code: First I generate about 40 versions (these are root nodes). Then for each of these I look for all neighbouring child nodes. For each of these children (files) I check the entire subtree for the occurence of a certain node.

I was under the understanding that using

try(Transaction tx){
 }

automatically closed my transaction, but my heap is still full. This makes my query run slow from the second or third passthrough of versions and eventually it crashes. Am I misunderstanding something? Or is there something I else I can do?

    Collection<Node> versions;
    Collection<Node> files;
    Collection<Node> nodes;
    try ( Transaction ignored = db.beginTx() )
    {
        versions = IteratorUtil.asCollection(db.traversalDescription().breadthFirst().relationships(ProjectRelations.HASVERSION, Direction.OUTGOING).evaluator(Evaluators.toDepth(1)).evaluator(Evaluators.excludeStartPosition()).traverse(db.getNodeById(0)).nodes());
        ignored.success();
    }

    for(Node v : versions){
        int fors = 0;
        test = 0;

        try( Transaction tx = db.beginTx()){

            files = IteratorUtil.asCollection(db.traversalDescription().breadthFirst().relationships(ProjectRelations.FILES, Direction.OUTGOING).evaluator(Evaluators.excludeStartPosition()).traverse(v).nodes());

            tx.success();
        }

        for( Node f : files ) {

            try (Transaction t = db.beginTx()){
                int i = 0;
                for(Node node : db.traversalDescription().depthFirst().relationships(RelTypes.ISPARENTOF, Direction.OUTGOING).evaluator(Evaluators.excludeStartPosition()).evaluator(e).traverse(f).nodes()){
                     //do some stuff
                }
                t.success();
            }   
        }

        files.clear();


    }
    versions.clear();

Update:

I replaced everything with iterators like:

try( 
                Transaction tx = db.beginTx(); 
                ResourceIterator<Node> files = db.traversalDescription().breadthFirst().relationships(ProjectRelations.FILES, Direction.OUTGOING).evaluator(Evaluators.excludeStartPosition()).traverse(v).nodes().iterator();
            ){



            int idx = 0;
            forloops = 0;
            long start = System.nanoTime();

            while( files.hasNext() ) {

                Node f = files.next();


                try (Transaction t = db.beginTx();
                        ResourceIterator<Node> blah = db.traversalDescription().depthFirst().relationships(RelTypes.ISPARENTOF, Direction.OUTGOING).evaluator(Evaluators.excludeStartPosition()).evaluator(e).traverse(f).nodes().iterator();
                        ){
                    int i = 0;

                    while(blah.hasNext()){
                        Node tempNode = blah.next();

                    }
                    blah.close();

                }

            }
            files.close();
        }


    }

The problem is that the Transaction keeps everything in memory until I exhaust the iterator or close() it

edit 2:

I used iterators for everything, using depth first traversal. I also changed my available heap memory from 4 GB to 1024mb. For now it seems to be running (though I'm not sure if it will complete it entirely), albeit very slow. It runs up to about 980mb, but does not cross that treshold (yet). Do have an enormous slowdown due to the fact that my heap is as good as full the entire time. Any ideas to improve on this? Or is this the best I'm going to get?

    try(Transaction tx = db.beginTx()){
    versions = IteratorUtil.asCollection(db
                .traversalDescription()
                .depthFirst()
                .relationships(ProjectRelations.HASVERSION,
                        Direction.OUTGOING)
                .evaluator(Evaluators.toDepth(1))
                .evaluator(Evaluators.excludeStartPosition())
                .traverse(root));

    }
    int mb = 1024 * 1024;
    Runtime runtime = Runtime.getRuntime();

    ResourceIterator<Node> files = null;

    try(Transaction tx = db.beginTx()){
        int idx = 0;
        for(Relationship rel : root.getRelationships(ProjectRelations.HASVERSION, Direction.OUTGOING)){
            idx++;
            System.out.println(idx);
            Node v = rel.getEndNode();
            files = db.traversalDescription().depthFirst().relationships(ProjectRelations.FILES, Direction.OUTGOING).evaluator(Evaluators.excludeStartPosition()).uniqueness(Uniqueness.NONE).traverse(v).nodes().iterator();
            long start = System.nanoTime();
            while(files.hasNext()){

                Node f = files.next();

                ResourceIterator<Node> node = db.traversalDescription().depthFirst().relationships(RelTypes.ISPARENTOF, Direction.OUTGOING).evaluator(Evaluators.excludeStartPosition()).evaluator(e).traverse(f).nodes().iterator();
                while(node.hasNext()){
                    node.next();
                }

            }
            System.out.println("Used Memory:"
                    + (runtime.totalMemory() - runtime.freeMemory()) / mb);
            System.out
                    .println("Total Memory:" + runtime.totalMemory() / mb);


            files.close();
        }

    }

    db.shutdown();

The exception thrown:

Exception in thread "GC-Monitor" Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Unknown Source)
at java.lang.AbstractStringBuilder.expandCapacity(Unknown Source)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(Unknown Source)
at java.lang.AbstractStringBuilder.append(Unknown Source)
at java.lang.StringBuilder.append(Unknown Source)
at ch.qos.logback.core.pattern.FormattingConverter.write(FormattingConverter.java:40)
at ch.qos.logback.core.pattern.PatternLayoutBase.writeLoopOnConverters(PatternLayoutBase.java:119)
at ch.qos.logback.classic.PatternLayout.doLayout(PatternLayout.java:168)
at ch.qos.logback.classic.PatternLayout.doLayout(PatternLayout.java:59)
at ch.qos.logback.core.encoder.LayoutWrappingEncoder.doEncode(LayoutWrappingEncoder.java:134)
at ch.qos.logback.core.OutputStreamAppender.writeOut(OutputStreamAppender.java:188)
at ch.qos.logback.core.FileAppender.writeOut(FileAppender.java:206)
at ch.qos.logback.core.OutputStreamAppender.subAppend(OutputStreamAppender.java:212)
at ch.qos.logback.core.OutputStreamAppender.append(OutputStreamAppender.java:103)
at ch.qos.logback.core.UnsynchronizedAppenderBase.doAppend(UnsynchronizedAppenderBase.java:88)
at ch.qos.logback.core.spi.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:48)
at ch.qos.logback.classic.Logger.appendLoopOnAppenders(Logger.java:272)
at ch.qos.logback.classic.Logger.callAppenders(Logger.java:259)
at ch.qos.logback.classic.Logger.buildLoggingEventAndAppend(Logger.java:441)
at ch.qos.logback.classic.Logger.filterAndLog_0_Or3Plus(Logger.java:395)
at ch.qos.logback.classic.Logger.warn(Logger.java:708)
at org.neo4j.kernel.logging.LogbackService$Slf4jToStringLoggerAdapter.warn(LogbackService.java:240)
at org.neo4j.kernel.impl.cache.MeasureDoNothing.run(MeasureDoNothing.java:84)

java.lang.OutOfMemoryError: Java heap space
at java.util.ArrayList.<init>(Unknown Source)
at org.neo4j.kernel.impl.core.RelationshipLoader.getMoreRelationships(RelationshipLoader.java:55)
at org.neo4j.kernel.impl.core.NodeManager.getMoreRelationships(NodeManager.java:779)
at org.neo4j.kernel.impl.core.NodeImpl.loadMoreRelationshipsFromNodeManager(NodeImpl.java:577)
at org.neo4j.kernel.impl.core.NodeImpl.getMoreRelationships(NodeImpl.java:466)
at org.neo4j.kernel.impl.core.NodeImpl.loadInitialRelationships(NodeImpl.java:394)
at org.neo4j.kernel.impl.core.NodeImpl.ensureRelationshipMapNotNull(NodeImpl.java:372)
at org.neo4j.kernel.impl.core.NodeImpl.getAllRelationshipsOfType(NodeImpl.java:219)
at org.neo4j.kernel.impl.core.NodeImpl.getRelationships(NodeImpl.java:325)
at org.neo4j.kernel.impl.core.NodeProxy.getRelationships(NodeProxy.java:154)
at org.neo4j.kernel.StandardExpander$RegularExpander.doExpand(StandardExpander.java:583)
at org.neo4j.kernel.StandardExpander$RelationshipExpansion.iterator(StandardExpander.java:195)
at org.neo4j.kernel.impl.traversal.TraversalBranchImpl.expandRelationshipsWithoutChecks(TraversalBranchImpl.java:115)
at org.neo4j.kernel.impl.traversal.TraversalBranchImpl.expandRelationships(TraversalBranchImpl.java:104)
at org.neo4j.kernel.impl.traversal.TraversalBranchImpl.initialize(TraversalBranchImpl.java:131)
at org.neo4j.kernel.impl.traversal.TraversalBranchImpl.next(TraversalBranchImpl.java:151)
at org.neo4j.graphdb.traversal.PreorderDepthFirstSelector.next(PreorderDepthFirstSelector.java:49)
at org.neo4j.kernel.impl.traversal.MonoDirectionalTraverserIterator.fetchNextOrNull(MonoDirectionalTraverserIterator.java:68)
at org.neo4j.kernel.impl.traversal.MonoDirectionalTraverserIterator.fetchNextOrNull(MonoDirectionalTraverserIterator.java:35)
at org.neo4j.helpers.collection.PrefetchingIterator.hasNext(PrefetchingIterator.java:55)
at org.neo4j.kernel.impl.traversal.DefaultTraverser$ResourcePathIterableWrapper$1.fetchNextOrNull(DefaultTraverser.java:140)
at org.neo4j.helpers.collection.PrefetchingIterator.hasNext(PrefetchingIterator.java:55)
at main.QueryExecutor.main(QueryExecutor.java:173)

Upvotes: 1

Views: 523

Answers (2)

Kevin
Kevin

Reputation: 775

I fixed my problem by setting the cache_type option to none. It does not run out of memory and completes in about an hour.

Upvotes: 0

Davide Grohmann
Davide Grohmann

Reputation: 11

It looks like you are consuming eagerly the whole iterator when you perform the second traversal by using IteratorUtil.asCollection(). I am unsure how much nodes are produced in that case, but if they are many (i.e., millions) it is likely that it will cause an out of memory problem.

Upvotes: 1

Related Questions