Neo4j data loading performance: driver vs custom procedure

Question

I'm switching from Neo4j Java custom procedures to an approach based on the Neo4j Java driver. I want to end up with some sort of microservice running my graph algorithm, instead of calling the custom procedure through Cypher. I implemented the traversal using a bunch of standard HashMaps: Once the data is loaded from Neo4j to these HashMaps, the graph traversal is much faster than in my original custom procedure, so that's very promising.

Now to my question: Within the custom procedure I was able to load the graph (40 mio edges, 10 mio nodes) to the Hashmaps like so:

@Context
public GraphDatabaseService db;
...

HashMap mapNodeIdProperty = new HashMap<>();
db.beginTx().getAllNodes().stream().forEach((org.neo4j.graphdb.Node n) 
              -> mapNodeIdProperty.put(n.getId(),
                     (Long) n.getProperty("combProp")));

This takes about a minute, which I consider acceptable as the service's start-up time.

Now, the best solution I could find using the driver goes something like this:

driver = GraphDatabase.driver( uri, AuthTokens.basic( user, password ) );

...

try ( Session session = driver.session() )
{
    String status = session.writeTransaction( new TransactionWork()
    {
        @Override
        public String execute( Transaction tx )
        {
            Stream resultStream = (Stream) tx.run(
                           "MATCH (n) RETURN n").stream();
            resultStream.forEach((Record n) -> listNodes.add(((Record) n).get("n").asNode()));
            return "ok; length=" + listNodes.size();
        }
    });
    System.out.println( status);
}

This takes too much time, even if I limit the number of returned nodes to just a few thousand using the cypher query. I never waited for the full graph to load.

What would be my best bet for getting the same speed using the driver (or alternative approaches ) as compared the stored procedure? Are there fundamental limitations which would inhibit this?

mcsoini · Accepted Answer

As InverseFalcon points out, the transfer over the bolt protocol comes with speed limitations, not present in procedures. What I was looking for is the embedded DBMS described here. I am now using this like this:

public class GraphAccess implements AutoCloseable
{
    ...

    private final DatabaseManagementService managementService =
        new DatabaseManagementServiceBuilder(NEO4J_DATA_PATH)
        .setConfig( GraphDatabaseSettings.read_only, true )
                        .setConfig(GraphDatabaseSettings.logs_directory,
                                   NEO4J_LOGS_DIRECTORY).build();
    public final GraphDatabaseService db = managementService.database( NEO4J_DATABASE_NAME );

    public final HashMap mapNodeProp = new HashMap<>();

    ...

    private void getNodeData() {

        try ( Transaction tx = db.beginTx() ) {
            tx.getAllNodes().forEach((Node n) -> {
                
                final Integer nodeId = (int) n.getId();
                mapNodeProp.put(nodeId, (String) n.getProperty("name"));
            
            }
        }
    }
}

This has the advantage of high throughput (since it doesn't use the driver, but accesses the neo4j database data directly) and the possibility to use it in the context of a Spring app (since it's not implemented as a custom procedure).

Neo4j data loading performance: driver vs custom procedure

Answers (2)

Related Questions