Rikard Olsson
Rikard Olsson

Reputation: 869

Import of data into Neo4j with Python takes time

Anyone experienced parsing and importing data into Neo4j using py2neo and Python? I'm currently trying to parse an relatively large (18700r x 17c) .csv file and store its created nodes and relations into Neo. By using py2neo, one must first create a model inheriting from py2neo.data.Node and then use

for n in nodes:
    tx = graph.begin()
    tx.create(node)

for r in relations:
    tx = graph.begin()
    tx.create(r)

to store all data. To parse the data and store it takes roughly about 2.5 min (real time) when running with time python ..., where its about half-half of time taking for parse and store.

Another way is to create a big query string, which I manage to do. When this is done one can run graph.run(big_query_string) to do the same job. Now it takes about 3 seconds to parse and 2.5 min to store. When I run the same query string directly in the browser it took over 3 minutes.

We are 2 people on the same project. Me on Neo4j and another on DGraph. It's in its core the same parsing code, but to store on DGraph takes at most 5 seconds...

Anyone having experiences on this?

UPDATE There are exactly 115139 "CREATE" statements in the query.

Upvotes: 0

Views: 2349

Answers (2)

David A Stumpf
David A Stumpf

Reputation: 793

It looks like your code is iterating node by node. If you have lots of data to import, using a CSV file will be much more efficient. Maybe your current CSVs can be used directly?

I use python code to create, modify or directly use CSV files and then import them. I am not a python guru, but this will give you a example of one way to do this:

First, setting up the connection to Neo4j

import Neo4jLib
from neo4j.v1 import GraphDatabase
from py2neo import Graph, Path, Node, Relationship  #http://py2neo.org/v3/
import re

importDir="C:\\Users\\david\\.Neo4jDesktop\\neo4jDatabases\\database-49f9269f-5936-4b08-96b7-c2b3fa3006fa\\installation-3.3.5\\import\\"

def Neo4jConnectionSetup( ):
    uri = "bolt://localhost:7687"
    driver = GraphDatabase.driver(uri, auth=("neo4j", "your password"))

Upload:

def UploadWithPeriodicCommit(Q):
    try:
        uri = "bolt://localhost:7687"
        driver = GraphDatabase.driver(uri, auth=("neo4j", "your password"))
        with driver.session() as session:
            session.run(Q)

where q is the cyper query, such as:

Neo4jLib.UploadWithPeriodicCommit("USING PERIODIC COMMIT 10000 
LOAD CSV WITH HEADERS FROM 'file:///vcf.csv' AS line FIELDTERMINATOR '|' 
merge (p:PosNode{Pos:toInteger(line.Pos)})")

Your CSV should go in the Import directory of the database in use. You specify only its name and not the full path.

These uploads and updates run fast.

Upvotes: 0

Nigel Small
Nigel Small

Reputation: 4495

Py2neo is not optimised for large imports such as this. You are better off using one of the dedicated import tools for Neo4j instead.

Upvotes: 1

Related Questions