user17375
user17375

Reputation: 529

py2neo: minimizing write-time when creating graph

I would write a huge graph to neo4j. Using my code would take slightly less than two months.

I took the data from Kaggle's events recommendation challenge, the user_friends.csv file I am using looks like

user,friends
3197468391,1346449342 3873244116 4226080662, ... 

I used the py2neo batch facility to produce the code. Is it the best I can do or is there another way to significantly reduce the running time?

Here 's the code

#!/usr/bin/env python

from __future__ import division
from time import time
import sqlite3
from py2neo import neo4j

graph = neo4j.GraphDatabaseService("http://localhost:7474/db/data/")
batch = neo4j.WriteBatch(graph)

people = graph.get_or_create_index( neo4j.Node,"people")
friends = graph.get_or_create_index( neo4j.Relationship,"friends")

con = sqlite3.connect("test.db")
c = con.cursor()
c.execute("SELECT user, friends FROM user_friends LIMIT 2;") 

t=time()
for u_f in c:
    u_node = graph.get_or_create_indexed_node("people",'name',u_f[0]) 

    for f in u_f[1].split(" "):
        f_node = graph.get_or_create_indexed_node("people",'name', f)

        if not f_node.is_related_to(u_node, neo4j.Direction.BOTH,"friends"):
            batch.create((u_node,'friends',f_node))

    batch.submit()
print time()-t

Also I cannot find a way to create an undirected graph using the high level py2neo facilities? I knowcypher can do this with someting like create (node(1) -[:friends]-node(2))

Thanks in advance.

Upvotes: 0

Views: 323

Answers (1)

Peter Neubauer
Peter Neubauer

Reputation: 6331

YOu should create connections not with Direction.BOTH. Chose one direction, and then ignore using Direction.BOTH it when traversing - it has no performance impact but the relationship directions are then deterministic. Cypher does exactly that when you say a--b.

Upvotes: 1

Related Questions