Reputation: 869
Anyone experienced parsing and importing data into Neo4j using py2neo
and Python? I'm currently trying to parse an relatively large (18700r x 17c) .csv file and store its created nodes and relations into Neo. By using py2neo, one must first create a model inheriting from py2neo.data.Node and then use
for n in nodes:
tx = graph.begin()
tx.create(node)
for r in relations:
tx = graph.begin()
tx.create(r)
to store all data. To parse the data and store it takes roughly about 2.5 min (real time) when running with time python ...
, where its about half-half of time taking for parse and store.
Another way is to create a big query string, which I manage to do. When this is done one can run graph.run(big_query_string)
to do the same job. Now it takes about 3 seconds to parse and 2.5 min to store. When I run the same query string directly in the browser it took over 3 minutes.
We are 2 people on the same project. Me on Neo4j and another on DGraph. It's in its core the same parsing code, but to store on DGraph takes at most 5 seconds...
Anyone having experiences on this?
UPDATE There are exactly 115139 "CREATE" statements in the query.
Upvotes: 0
Views: 2349
Reputation: 793
It looks like your code is iterating node by node. If you have lots of data to import, using a CSV file will be much more efficient. Maybe your current CSVs can be used directly?
I use python code to create, modify or directly use CSV files and then import them. I am not a python guru, but this will give you a example of one way to do this:
First, setting up the connection to Neo4j
import Neo4jLib
from neo4j.v1 import GraphDatabase
from py2neo import Graph, Path, Node, Relationship #http://py2neo.org/v3/
import re
importDir="C:\\Users\\david\\.Neo4jDesktop\\neo4jDatabases\\database-49f9269f-5936-4b08-96b7-c2b3fa3006fa\\installation-3.3.5\\import\\"
def Neo4jConnectionSetup( ):
uri = "bolt://localhost:7687"
driver = GraphDatabase.driver(uri, auth=("neo4j", "your password"))
Upload:
def UploadWithPeriodicCommit(Q):
try:
uri = "bolt://localhost:7687"
driver = GraphDatabase.driver(uri, auth=("neo4j", "your password"))
with driver.session() as session:
session.run(Q)
where q is the cyper query, such as:
Neo4jLib.UploadWithPeriodicCommit("USING PERIODIC COMMIT 10000
LOAD CSV WITH HEADERS FROM 'file:///vcf.csv' AS line FIELDTERMINATOR '|'
merge (p:PosNode{Pos:toInteger(line.Pos)})")
Your CSV should go in the Import directory of the database in use. You specify only its name and not the full path.
These uploads and updates run fast.
Upvotes: 0
Reputation: 4495
Py2neo is not optimised for large imports such as this. You are better off using one of the dedicated import tools for Neo4j instead.
Upvotes: 1