Reputation: 4493
I am importing 2.3 Billion relationship from a table, The import is not very fast getting a speed on 5Million per hour that will take 20 days to complete the migration. I have heard about the neo4j batch insert and and batch insert utility. The utility do interesting stuff by importing from a csv file but the latest code is some how broken and not running.
I have about 100M relations in neo4j and I have to all check that there shall be no duplicate relationship.
How can I fast the things in neo4j
By current code is like
begin transaction
for 50K relationships
create or get user node for user A
create or get user node for user B
check there is relationship KNOW between A to B if not create the relationhsip
end transaction
I have also read the following:
Upvotes: 5
Views: 2723
Reputation: 5918
in case of relationships, and supposing you have enough storage, i would try to not make unique relationships in the import phase - right now i'm actually also importing an SQL table with ~3mil records but i always create a relationship and don't mind whether it is duplicite or not.
you can later after the import simply do a cypher query which will craete unique relationships like this:
START n=node(*) MATCH n-[:KNOW]-m
CREATE UNIQUE n-[:KNOW2]-m;
and
START r=rel(*) where type(r)='KNOW' delete r;
at least this is my approach now and running the later cypher query takes just about minutes. problem could be when you really have bilions of nodes, than the cypher query might fall into an memory error (depends on how much cache you set up for the neo4j engine)
Upvotes: 3
Reputation: 3054
How do you do "get user node for user A", lookup from an index? Index lookups really slows batch insertions down. Try to cache as big a part of the users as possible in a simple HashMap "in front" of the index, or use BatchInserterIndex#setCacheCapacity
Upvotes: 0