Reputation: 1089
I use the following code to create a graph with Neo4j Graph Database:
import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.HashMap;
import java.util.Map;
import org.neo4j.graphdb.RelationshipType;
import org.neo4j.graphdb.index.IndexHits;
import org.neo4j.helpers.collection.MapUtil;
import org.neo4j.index.lucene.unsafe.batchinsert.LuceneBatchInserterIndexProvider;
import org.neo4j.unsafe.batchinsert.BatchInserter;
import org.neo4j.unsafe.batchinsert.BatchInserterIndex;
import org.neo4j.unsafe.batchinsert.BatchInserterIndexProvider;
import org.neo4j.unsafe.batchinsert.BatchInserters;
public class Neo4jMassiveInsertion implements Insertion {
private BatchInserter inserter = null;
private BatchInserterIndexProvider indexProvider = null;
private BatchInserterIndex nodes = null;
private static enum RelTypes implements RelationshipType {
SIMILAR
}
public static void main(String args[]) {
Neo4jMassiveInsertion test = new Neo4jMassiveInsertion();
test.startup("data/neo4j");
test.createGraph("data/enronEdges.txt");
test.shutdown();
}
/**
* Start neo4j database and configure for massive insertion
* @param neo4jDBDir
*/
public void startup(String neo4jDBDir) {
System.out.println("The Neo4j database is now starting . . . .");
Map<String, String> config = new HashMap<String, String>();
inserter = BatchInserters.inserter(neo4jDBDir, config);
indexProvider = new LuceneBatchInserterIndexProvider(inserter);
nodes = indexProvider.nodeIndex("nodes", MapUtil.stringMap("type", "exact"));
}
public void shutdown() {
System.out.println("The Neo4j database is now shuting down . . . .");
if(inserter != null) {
indexProvider.shutdown();
inserter.shutdown();
indexProvider = null;
inserter = null;
}
}
public void createGraph(String datasetDir) {
System.out.println("Creating the Neo4j database . . . .");
try {
BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(datasetDir)));
String line;
int lineCounter = 1;
Map<String, Object> properties;
IndexHits<Long> cache;
long srcNode, dstNode;
while((line = reader.readLine()) != null) {
if(lineCounter > 4) {
String[] parts = line.split("\t");
cache = nodes.get("nodeId", parts[0]);
if(cache.hasNext()) {
srcNode = cache.next();
}
else {
properties = MapUtil.map("nodeId", parts[0]);
srcNode = inserter.createNode(properties);
nodes.add(srcNode, properties);
nodes.flush();
}
cache = nodes.get("nodeId", parts[1]);
if(cache.hasNext()) {
dstNode = cache.next();
}
else {
properties = MapUtil.map("nodeId", parts[1]);
dstNode = inserter.createNode(properties);
nodes.add(dstNode, properties);
nodes.flush();
}
inserter.createRelationship(srcNode, dstNode, RelTypes.SIMILAR, null);
}
lineCounter++;
}
reader.close();
}
catch (IOException e) {
e.printStackTrace();
}
}
}
Comparing with other graph database technologies (titan, orientdb) it needs too much time. So may i am doing something wrong. Is there a way to boost up the procedure?
I use neo4j 1.9.5 and my machine has a 2.3 Ghz CPU (i5), 4GB RAM and 320GB disk and I am running on Macintosh OSX Mavericks (10.9). Also my heap size is at 2GB.
Upvotes: 1
Views: 1072
Reputation: 41676
Usually I can import about 1M nodes and 200k relationships per second on my macbook.
Please don't flush & search on every insert, that totally kills performance. Keep your nodeIds in a HashMap from your data to node-id, and only write to lucene during the import.
(If you care about memory usage you can also go with something like gnu-trove)
You also use too little RAM (I usually use heaps between 4 and 60GB depending on the data set size) and you don't have any config set.
Please check as sensible config something like this, depending on you data volume I'd raise these numbers.
cache_type=none
use_memory_mapped_buffers=true
neostore.nodestore.db.mapped_memory=200M
neostore.relationshipstore.db.mapped_memory=1000M
neostore.propertystore.db.mapped_memory=250M
neostore.propertystore.db.strings.mapped_memory=250M
And make sure to give it enough heap. You might also have a disk that might be not the fastest. Try to increase your heap to at least 3GB. Also make sure to have the latest JDK, 1.7.._b25 had a memory allocation issue (it allocated only a tiny bit of memory for the
Upvotes: 1