maaz
maaz

Reputation: 4493

Batch Insertion with Neo4j

I am importing 2.3 Billion relationship from a table, The import is not very fast getting a speed on 5Million per hour that will take 20 days to complete the migration. I have heard about the neo4j batch insert and and batch insert utility. The utility do interesting stuff by importing from a csv file but the latest code is some how broken and not running.

I have about 100M relations in neo4j and I have to all check that there shall be no duplicate relationship.

How can I fast the things in neo4j

By current code is like

begin transaction
for 50K relationships
create or get user node for user A
create or get user node for user B
check there is relationship KNOW between A to B if not create the relationhsip
end transaction

I have also read the following:

Upvotes: 5

Views: 2723

Answers (2)

ulkas
ulkas

Reputation: 5918

in case of relationships, and supposing you have enough storage, i would try to not make unique relationships in the import phase - right now i'm actually also importing an SQL table with ~3mil records but i always create a relationship and don't mind whether it is duplicite or not.

you can later after the import simply do a cypher query which will craete unique relationships like this:

START n=node(*) MATCH n-[:KNOW]-m
CREATE UNIQUE n-[:KNOW2]-m;

and

START r=rel(*) where type(r)='KNOW' delete r;

at least this is my approach now and running the later cypher query takes just about minutes. problem could be when you really have bilions of nodes, than the cypher query might fall into an memory error (depends on how much cache you set up for the neo4j engine)

Upvotes: 3

Mattias Finné
Mattias Finné

Reputation: 3054

How do you do "get user node for user A", lookup from an index? Index lookups really slows batch insertions down. Try to cache as big a part of the users as possible in a simple HashMap "in front" of the index, or use BatchInserterIndex#setCacheCapacity

Upvotes: 0

Related Questions