Reputation: 155
I created 1 million Neo4j nodes in batches of 10000, each batch in its own transaction. The strange thing is that parallelizing this process with multi-threaded execution did not have any positive effect on performance. It is as if the transactions in different threads are blocking each other.
Here's a piece of Scala code that tests this with the help of parallel collections:
import org.neo4j.kernel.EmbeddedGraphDatabase
object Main extends App {
val total = 1000000
val batchSize = 10000
val db = new EmbeddedGraphDatabase("neo4yay")
Runtime.getRuntime().addShutdownHook(
new Thread(){override def run() = db.shutdown()}
)
(1 to total).grouped(batchSize).toSeq.par.foreach(batch => {
println("thread %s, nodes from %d to %d"
.format(Thread.currentThread().getId, batch.head, batch.last))
val transaction = db.beginTx()
try{
batch.foreach(db.createNode().setProperty("Number", _))
}finally{
transaction.finish()
}
})
}
and here are the build.sbt
lines needed for building and running it:
scalaVersion := "2.9.2"
libraryDependencies += "org.neo4j" % "neo4j-kernel" % "1.8.M07"
fork in run := true
One can switch between parallel and sequential modes by removing and adding .par
invocation before the outer foreach
. The console output clearly shows then that with .par
execution is indeed multi-threaded.
To rule out possible problems with concurrency in this code, I have also tried an actor-based implementation, with about the same result (6 and 7 seconds for sequential and parallel versions, respectively).
So, the question is: did I do something wrong or this is a Neo4j limitation? Thanks!
Upvotes: 4
Views: 1460
Reputation: 41706
The main issue is that your tx arrive at about the same time. And transaction commits are serialized writes to the transaction log. If the writes would be interleaved time-wise and the actual node-creation a more expensive process you would get a speedup.
Upvotes: 4
Reputation: 1777
Batch insert does not work with multiple threads. From the neo4j Documentation:
Always perform batch insertion in a single thread (or use synchronization to make only one thread at a time access the batch inserter) and invoke shutdown when finished.
Upvotes: 2