scaling Neo4J for simultaneous writes/merge queries

Question

With NodeJS, I'm collecting this data to Neo4J:

user (id, login)
app (id, title)
machine (id, address)

This is totally async. This means that many times I can get info about a USER and APP that does'n exists yet in database.

So, instead just CREATE items and then CREATE relationships, I need to run for each row a MERGE in USER and APP to ensure they are in database. After this I can create relationships.

I'm sending to /cypher endpoint.

params:{props:objects_array},
query:[
  ' FOREACH (p IN {props} | ',
  '   MERGE (u:user    {id:p._user_id})    SET u.id = p._user_id ',
  '   MERGE (a:app     {id:p._app_id})     SET a.id = p._app_id ',
  '   MERGE (m:machine {id:p._machine_id}) SET m.id = p._machine_id ',
  '   MERGE (u)-[:OPENED]->(a) ',
  '   MERGE (a)-[:USERS]->(u) ',
  '   MERGE (u)-[:WORK_IN]->(m) ',
  ' )',
].join("")

This is working, but is so slow. I'm controlling balance by changing simultaneous requests and rows per requests.

With 5 simultaneous requests and 500 rows each, it works with 4 deadlock errors and takes 2 minutes to complete 5000 rows.

The problem is it takes 2-core CPU to 99% (digitalocean 4gb RAM) all the time, and I need scale this to 150 simultaneous requests, not just 5.

I think possible solutions are:

vertical scale to a quad-core (solve just for now, not long time solution)
database replicas to distribute operations (is this possible?)
some way to use CREATE instead MERGE. this is extremely faster, but I need ensure unique ID's and different properties. maybe using constraints?
is neo4j really able to handle this amount of data?

Jacob Davis-Hansson · Accepted Answer

If you only use MERGE with no unique constraints or indexes, Neo4j is required to, for each MERGE, do a full scan of the node space to verify no other item exists with the given property. This means you end up with an algorithmic complexity that is very high, and also disk bound.

You need to create either an index or (preferably) a unique constraint for the label/property tuples that are supposed to be unique in the database. This should significantly speed up your MERGE query.

Also, you'll be better off if you use the new transactional endpoint (http://docs.neo4j.org/chunked/stable/rest-api-transactional.html), which replaces the cypher endpoint. It supports the same features as the cypher endpoint, but improves performance and provides the ability to run multiple cypher statements per HTTP call as well as allowing transactions to remain running on the server for long-running transaction support for the client.

Past that, Neo4j 2.1 is due to be released in the next few months, and it contains several performance enhancements that significantly speed up concurrent query execution. You may want to try the upcoming milestone to see how that helps your performance.

scaling Neo4J for simultaneous writes/merge queries

Answers (2)

Related Questions