Rob
Rob

Reputation: 3459

What is the fastest way to import to Neo4j?

I have a list of JSON documents, in the format:

[{a:1, b:[2,5,6]}, {a:2, b:[1,3,5]}, ...]

What I need to do is make nodes with parameter a, and connect them to all the nodes in the list b that have that value for a. So the first node will connect to nodes 2, 5 and 6. Right now I'm using Python's neo4jrestclient to populate but it's taking a long time. Is there a faster way to populate?

Currently this is my script:

break_list = []
for each in ans[1:]:
    ref = each[0]
    q = """MATCH n WHERE n.url = '%s' RETURN n;""" %(ref)
    n1 = gdb.query(q, returns=client.Node)[0][0]
    for link in each[6]:
        if len(link)>4:
            text,link = link.split('!__!')
            q2 = """MATCH n WHERE n.url = '%s' RETURN n;""" %(link)
            try:
                n2 = gdb.query(q2, returns=client.Node)
                n1.relationships.create("Links", n2[0][0], anchor_text=text)
            except:
                break_list.append((ref,link))

Upvotes: 1

Views: 814

Answers (1)

William Lyon
William Lyon

Reputation: 8546

You might want to consider converting your JSON to CSV (using some like jq), then you could use the LOAD CSV Cypher tool for import. LOAD CSV is optimized for data import so you will have much better performance using this method. With your example the LOAD CSV script would look something like this:

Your JSON converted to CSV:

"a","b"
"1","2,5,6"
"2","1,3,5"

First create uniqueness constraint / index. This will ensure only one Node is created for any "name" and create an index for faster lookup performance.

CREATE CONSTRAINT ON (p:Person) ASSERT p.name IS UNIQUE;

Given the above CSV file this Cypher script can be used to efficiently import data:

USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///path/to/file.csv" AS row
MERGE (a:Person{name: row.a})
WITH a,row
UNWIND split(row.b,',') AS other
MERGE (b:Person {name:other})
CREATE UNIQUE (a)-[:CONNECTED_TO]->(b);

Other option

Another option is to use the JSON as a parameter in a Cypher query and then iterate through each element of the JSON array using UNWIND.

WITH {d} AS json
UNWIND json AS doc
MERGE (a:Person{name: doc.a})
WITH doc, a
UNWIND doc.b AS other
MERGE (b:Person{name:other})
CREATE UNIQUE (a)-[:CONNECTED_TO]->(b); 

Although there might be some performance issues with a very large JSON array. See some examples of this here and here.

Upvotes: 2

Related Questions