Jausk
Jausk

Reputation: 325

Efficient way to create relationships in neo4j

I have a neo4j database populated with thousands of nodes without any relationship defined. I have a file which contains relationships between nodes, so I would like to create relationships between these nodes created in the database. My current approach is:

from py2neo import NodeSelector,Graph,Node,Relationship
graph = Graph('http://127.0.0.1:7474/db/data')
tx = graph.begin()
selector = NodeSelector(graph)
with open("file","r") as relations:
    for line in relations:
        line_split=line.split(";")
        node1 = selector.select("Node",unique_name=line_split[0]).first()
        node2 = selector.select("Node",unique_name=line_split[1]).first()
        rs = Relationship(node1,"Relates to",node2)
        tx.create(rs)
tx.commit()

The current approach needs 2 queries to database in order to obtain nodes to form a relationship + relationship creation. Is there a more efficient way given that nodes currently exist in the database?

Upvotes: 2

Views: 480

Answers (1)

urban
urban

Reputation: 5682

You can use some form of node caching while populating relations:

from py2neo import NodeSelector,Graph,Node,Relationship
graph = Graph('http://127.0.0.1:7474/db/data')
tx = graph.begin()
selector = NodeSelector(graph)
node_cache = {}

with open("file","r") as relations:
    for line in relations:
        line_split=line.split(";")

        # Check if we have this node in the cache
        if line_split[0] in node_cache:
            node1 = node_cache[line_split[0]]
        else:
            # Query and store for later
            node1 = selector.select("Node",unique_name=line_split[0]).first()
            node_cache[line_split[0]] = node1

        if line_split[1] in node_cache:
            node2 = node_cache[line_split[1]]
        else:
            node2 = selector.select("Node",unique_name=line_split[1]).first()
            node_cache[line_split[1]] = node2

        rs = Relationship(node1,"Relates to",node2)
        tx.create(rs)

tx.commit()

With the above you will only load each node once and only if that node appears in your input file.

Upvotes: 2

Related Questions