Building relationships in Neo4j via Py2neo is very slow

Question

We have 5 different types of nodes in database. Largest one has ~290k, the smallest is only ~3k. Each node type has an id field and they are all indexed. I am using py2neo to build relationship, but it is very slow (~ 2 relationships inserted per second)

I used pandas read from a relationship csv, iterate each row to create a relationship wrapped in transaction. I tried batch out 10k creation statements in one transaction, but it does not seem to improve the speed a lot.

Below is the code:

df = pd.read_csv(r"C:\relationship.csv",dtype = datatype, skipinitialspace=True, usecols=fields)
df.fillna('',inplace=True)

def f(node_1 ,rel_type, node_2):
    try:
        tx = graph.begin()
        tx.evaluate('MATCH (a {node_id:$label1}),(b {node_id:$label2}) MERGE (a)-[r:'+rel_type+']->(b)',
                    parameters = {'label1': node_1, 'label2': node_2})
        tx.commit()
    except Exception as e:
        print(str(e))

for index, row in df.iterrows():
    if(index%1000000 == 0):
        print(index)
    try:
        f(row["node_1"],row["rel_type"],row["node_2"])
    except:
        print("error index: " + index)

Can someone help me what I did wrong here. Thanks!

cybersam · Accepted Answer

You state that there are "5 different types of nodes" (which I interpret to mean 5 node labels, in neo4j terminology). And, furthermore, you state that their id properties are already indexed.

But your f() function is not generating a Cypher query that uses the labels at all, and neither does it use the id property. In order to take advantage of your indexes, your Cypher query has to specify the node label and the id value.

Since there is currently no efficient way to parameterize the label when performing a MATCH, the following version of the f() function generates a Cypher query that has hardcoded labels (as well as a hardcoded relationship type):

def f(label_1, id_1, rel_type, label_2, id_2):
    try:
        tx = graph.begin()
        tx.evaluate(
                'MATCH' +
                '(a:' + label_1 + '{id:$id1}),' +
                '(b:' + label_2 + '{id:$id2}) ' +
                'MERGE (a)-[r:'+rel_type+']->(b)',
            parameters = {'id1': id_1, 'id2': id_2})
        tx.commit()
    except Exception as e:
        print(str(e))

The code that calls f() will also have to be changed to pass in both the label names and the id values for a and b. Hopefully, your df rows will contain that data (or enough info for you to derive that data).

Building relationships in Neo4j via Py2neo is very slow

Answers (2)

Related Questions