Reputation: 4345
I am finding Neo4j slow to add nodes and relationships/arcs/edges when using the REST API via py2neo for Python. I understand that this is due to each REST API call executing as a single self-contained transaction.
Specifically, adding a few hundred pairs of nodes with relationships between them takes a number of seconds, running on localhost.
What is the best approach to significantly improve performance whilst staying with Python?
Would using bulbflow and Gremlin be a way of constructing a bulk insert transaction?
Thanks!
Upvotes: 18
Views: 14689
Reputation: 2614
To insert a bulk of nodes in very high speed to Neo4K
Batch Inserter
http://neo4j.com/docs/stable/batchinsert-examples.html
In my case I'm working on Java.
Upvotes: 0
Reputation: 16733
Well, I myself had need for massive performance from neo4j. I end up doing following things to improve graph performance.
Upvotes: 2
Reputation: 1638
There's so many old answers to this question online, that it took me forever to realize there's an import tool that comes with neo4j. It's very fast and the best tool I was able to find.
Here's a simple example if we want to import student nodes:
bin/neo4j-import --into [path-to-your-neo4j-directory]/data/graph.db --nodes students
The students file contains data that looks like this, for example:
studentID:Id(Student),name,year:int,:LABEL
1111,Amy,2000,Student
2222,Jane,2012,Student
3333,John,2013,Student
Explanation:
Here's the documentation for it: http://neo4j.com/docs/stable/import-tool-usage.html
Note: I realize the question specifically mentions python, but another useful answer mentions a non-python solution.
Upvotes: 2
Reputation: 4495
There are several ways to do a bulk create with py2neo, each making only a single call to the server.
create
method to build a number of nodes and relationships in a single batch.WriteBatch
class (just released this week) to manually make a batch of nodes and relationships (this is really just a manual version of 1).If you have some code, I'm happy to look at it and make suggestions on performance tweaks. There are also quite a few tests you may be able to get inspiration from.
Cheers, Nige
Upvotes: 9
Reputation: 4814
Neo4j's write performance is slow unless you are doing a batch insert.
The Neo4j batch importer (https://github.com/jexp/batch-import) is the fastest way to load data into Neo4j. It's a Java utility, but you don't need to know any Java because you're just running the executable. It handles typed data and indexes, and it imports from a CSV file.
To use it with Bulbs (http://bulbflow.com/) Models, use the model get_bundle()
method to get the data, index name, and index keys, which is prepared for insert, and then output the data to a CSV file. Or if you don't want to model your data, just output your data from Python to the CSV file.
Will that work for you?
Upvotes: 6