Slow performance while saving data in neo4j Enterprise Version (trail version)

Question

Build 3 Node cluster in testing environment and used Neo4j-JDBC connection to save JSON data into Neo4j.

When creating just 2000 nodes and 2000 relations through JSON statistics are: Total time to save topology data in Neo4j: 456688 ms and links size: 2000, nodes size: 2000.

Saved without checking duplicacy of nodes/relations(Removed checkVertex and checkRelation methods):

Total time to save topology data in Neo4j: 446979 ms and links size: 2000, nodes size: 4000 (As we are not checking duplicacy, double nodes has been created).

Code:

public Connection getConnection(String masterNodeIp, String password) throws Exception {         

 return(Connection)DriverManager.getConnection("jdbc:neo4j:http://"+masterNodeIp+"/?user=neo4j,password="+password+"");

}

//By iterating through edges, Added source and target nodes.

    try {
    for (Links link : topology.getL2links()) {
      if(conn != null) {
        long srcId = etGraphIdByUniquenessOfOrphan(clientId,link.getSrcMgmtIP());
        GraphId srcGraphId = prepareGraphId(srcId, "DEVICE");
        long tgtId = etGraphIdByUniquenessOfOrphan(clientId,link.getTgtMgmtIP());
        GraphId tgtGraphId = prepareGraphId(tgtId, "DEVICE");
        String srcQuery = createNode(conn, link, false,clientId,discProfileId, 
                          srcGraphId);          
        if(srcQuery!=null && !srcQuery.isEmpty()) 
            stmt.execute(srcQuery);                         
        String tgtQuery = createNode(conn, link, true,clientId,discProfileId, 
                          tgtGraphId);
        if(tgtQuery != null && !tgtQuery.isEmpty()) 
            stmt.execute(tgtQuery);
        String relationQuery = processRelation(conn, link,srcGraphId,tgtGraphId);
        if(relationQuery!=null && !relationQuery.isEmpty())
            stmt.execute(relationQuery);
        }
    }
} catch(Exception e) {
    System.out.println("Exception in processJsonData ::: "+e.getMessage());
    throw e;
} finally {
    stmt.close();
    conn.close();
}

//Before creating node checked whether node is already existed or not in order to avoid duplicacy

private boolean checkVertex(Connection conn, String ip, String hostName, long clientId, long discPId, GraphId graphId) throws Exception{
    Statement stmt = null;
    ResultSet rs = null;
    boolean result=false;
    try {           
        stmt = conn.createStatement();          
        StringBuffer queryBuffer = new StringBuffer();
        queryBuffer.append(" MATCH (node) WHERE node.id ='"+graphId.getId()+"' AND node.sourceType = '"+graphId.getSourceType()+"'");
        queryBuffer.append(" RETURN node");
        rs = (ResultSet) stmt.executeQuery(queryBuffer.toString());
        while(rs.next()) {
            result=true;
            break;
        }
    } catch(Exception e) {
        System.out.println("Exception in fetching node ::: "+e.getMessage());
        throw e;
    } finally {
        rs.close();
        stmt.close();
    }

    return result;
}

//Before creating Relation also checked duplicacy for relationships.

private boolean checkRelation(Connection conn, Links link, GraphId srcGraphId, GraphId tgtGraphId) throws SQLException {
    Statement stmt = null;
    ResultSet rs = null;
    boolean result=false;
    try {
        stmt = conn.createStatement();          
        StringBuffer queryBuffer = new StringBuffer();
        queryBuffer.append(" MATCH (src:resource)-[r:topology]->(tgt:resource) WHERE src.id='"+srcGraphId.getId()
            +"' AND tgt.id='"+tgtGraphId.getId()+"' AND r.srcInt='"+link.getSrcInt()+"'AND r.tgtInt='"+link.getTgtInt()+"'");
        queryBuffer.append(" RETURN r");
        rs=(ResultSet) stmt.executeQuery(queryBuffer.toString());
        while(rs.next()) {
            result=true;
            break;
        }
    }
    catch(Exception e) {
        System.out.println("Exception in fetching node ::: "+e.getMessage());
    } finally {
        rs.close();
        stmt.close();
    }
    return result;
}

We created indexes for those duplicacy check queries but still performance is slow.

And also please let us know how to use "Node key" unique constraint in Java level so that we can skip once checkVertex query. We tried to catch "constraintViolationexception" and added log instead of throwing it but it's throwing exception not saving any nodes.

Michael Hunger · Accepted Answer

There are a lot of things that you can improve:

for mass data imports use the Java Driver directly, JDBC adds an indirection layer
Use parameters!
Use batching, either with UNWIND or by executing multiple prepared statemts as batch
Don't construct queries with literal values.
Make sure you have indexes/constraints for your keys. Your queries don't use any indexes because you didn't provide any labels!
Use MERGE if you don't want to have constraint exceptions.
Don't use StringBuffer, ever.
Use try-with-resources
Use executeUpdate

For Batching: https://medium.com/@mesirii/5-tips-tricks-for-fast-batched-updates-of-graph-structures-with-neo4j-and-cypher-73c7f693c8cc

For parameters: http://neo4j-contrib.github.io/neo4j-jdbc/#_minimum_viable_snippet

Slow performance while saving data in neo4j Enterprise Version (trail version)

Answers (1)

Related Questions