Reputation: 127
Neo4j : Enterprise version 3.2
I see a tremendous difference between the following two calls in terms for speed. Here are the settings and query/API.
Page Cache : 16g | Heap : 16g
Number of row/nodes -> 600K
cypher code (ignore syntax if any) | Time Taken : 50 sec.
using periodic commit 10000
load with headers from 'file:///xyx.csv' as row with row
create(n:ObjectTension) set n = row
From Java (session pool, with 15 session at time as an example):
Thread_1 : Time Taken : 8 sec / 10K
Map<String,Object> pList = new HashMap<String, Object>();
try(Transaction tx = Driver.session().beginTransaction()){
for(int i = 0; i< 10000; i++){
pList.put(i, i * i);
params.put("props",pList);
String query = "Create(n:Label {props})";
// String query = "create(n:Label) set n = {props})";
tx.run(query, params);
}
Thread_2 : Time taken is 9 sec / 10K
Map<String,Object> pList = new HashMap<String, Object>();
try(Transaction tx = Driver.session().beginTransaction()){
for(int i = 0; i< 10000; i++){
pList.put(i, i * i);
params.put("props",pList);
String query = "Create(n:Label {props})";
// String query = "create(n:Label) set n = {props})";
tx.run(query, params);
}
.
.
.
Thread_3 : Basically the above code is reused..It's just an example.
Thread_N where N = (600K / 10K)
Hence, the over all time taken is around 2 ~ 3 mins.
The question are the following?
Or
Create multiple session based on the parameter passed as "Using periodic commit 10000", with this 600K/10000 is 60 session? etc
The idea is achieve the same write performance as CSV load via Java. As the csv load 12000 nodes in ~5 seconds or even better.
Upvotes: 0
Views: 413
Reputation: 66999
Your Java code is doing something very different than your Cypher code, so it really makes no sense to compare processing times.
You should change your Java code to read from the same CSV file. File IO is fairly expensive, but your Java code is not doing any.
Also, whereas your pure Cypher query is creating nodes with a fixed (and presumably relatively small) number of properties, your Java pList
is growing in size with every loop iteration -- so that each Java loop creates nodes with between 1 to 10K properties! This may be the main reason why your Java code is much slower.
[UPDATE 1]
If you want to ignore the performance difference between using and not using a CSV file, the following (untested) code should give you an idea of what similar logic would look like in Java. In this example, the i
loop assumes that your CSV file has 10 columns (you should adjust the loop to use the correct column count). Also, this example gives all the nodes the same properties, which is OK as long as you have not created a contrary uniqueness constraint.
Session session = Driver.session();
Map<String,Object> pList = new HashMap<String, Object>();
for (int i = 0; i < 10; i++) {
pList.put(i, i * i);
}
Map<String, Map> params = new HashMap<String, Map>();
params.put("props", pList);
String query = "create(n:Label) set n = {props})";
for (int j = 0; j < 60; j++) {
try (Transaction tx = session.beginTransaction()) {
for(int k = 0; k < 10000; k++){
tx.run(query, params);
}
}
}
[UPDATE 2 and 3, copied from chat and then fixed]
Since the Cypher planner is able to optimize, the actual internal logic is probably a lot more efficient than the Java code I provided (above). If you want to also optimize your Java code (which may be closer to the code that Cypher actually generates), try the following (untested) code. It sends 10000 rows of data in a single run()
call, and uses the UNWIND
clause to break it up into individual rows on the server.
Session session = Driver.session();
Map<String, Integer> pList = new HashMap<String, Integer>();
for (int i = 0; i < 10; i++) {
pList.put(Integer.toString(i), i*i);
}
List<Map<String,Integer>> rows = Collections.nCopies(1, pList);
Map<String, List> params = new HashMap<String, List>();
params.put("rows", rows);
String query = "UNWIND {rows} AS row CREATE(n:Label) SET n = {row})";
for (int j = 0; j < 60; j++) {
try (Transaction tx = session.beginTransaction()) {
tx.run(query, params);
}
}
Upvotes: 1
Reputation: 4554
You can try are creating the nodes using Java API, instead of relying on Cypher:
createNode
- http://neo4j.com/docs/java-reference/current/javadocs/org/neo4j/graphdb/GraphDatabaseService.html#createNode-org.neo4j.graphdb.Label...-setProperty
- http://neo4j.com/docs/java-reference/current/javadocs/org/neo4j/graphdb/PropertyContainer.html#setProperty-java.lang.String-java.lang.Object-Also, as predecessor had mentioned, props
variable has different values for your cases.
Additionally, notice that every iteration you are performing query parsing (String query = "Create(n:Label {props})";
) - unless it is optimized out by neo4j itself.
Upvotes: 0