S8N
S8N

Reputation: 157

What is the most powerful way to import data into a Neo4J database?

I have 150 Mysql tables of about 3M rows with just over 200 columns per row at the moment (the number will only increase), I would like to transfer my tables to a single Neo4J database with each field being a node, so with the numbers above it would be :

so (150 * 3000000) + (200 * 10)= 450'002'000 nodes approximately

Which method will be the most appropriate for importing so many nodes and relationships? knowing that there are 6 specific nodes that serve as unique identifiers and that must therefore be merged to avoid duplicates.

I think the MATCH is extremely heavy so I guess it should be avoided as much as possible, do you think it will be useful to use a method to know if the node exists other than asking Neo4J (for example a MongoDB) in order to avoid a MATCH, is a good idea?

Thank you in advance.

Upvotes: 0

Views: 186

Answers (2)

David A Stumpf
David A Stumpf

Reputation: 793

with the number of rows you will be importing you might consider apoc.periodic.iterate. It batches the loading and has retries if there is short duration locking. Also, consider the garbage that can accumulate (in gB quantities) in the transaction logs. I add a few lines in the config file to keep it cleaned up:

dbms.checkpoint.interval.time=30s
dbms.checkpoint.interval.tx=1
dbms.tx_log.rotation.retention_policy=false
dbms.tx_log.rotation.size=1M
dbms.transaction.timeout=30m

As Graphileon mentions, make sure you have nodes and relationships indexed in advance of the load; otherwise you'll have very long load times. Also, keep in mind that any change in properties will create duplicates even with MERGE.

Upvotes: 1

Graphileon
Graphileon

Reputation: 5385

in most scenarios for importing data, you use MERGE instead of MATCH which allows you to create and re-use unique entities. Make sure you have your CONSTRAINTS setup correctly before you start the import. This is a good place to start reading.

Upvotes: 2

Related Questions