user3175226
user3175226

Reputation: 3669

scaling Neo4J for simultaneous writes/merge queries

With NodeJS, I'm collecting this data to Neo4J:

This is totally async. This means that many times I can get info about a USER and APP that does'n exists yet in database.

So, instead just CREATE items and then CREATE relationships, I need to run for each row a MERGE in USER and APP to ensure they are in database. After this I can create relationships.

I'm sending to /cypher endpoint.

params:{props:objects_array},
query:[
  ' FOREACH (p IN {props} | ',
  '   MERGE (u:user    {id:p._user_id})    SET u.id = p._user_id ',
  '   MERGE (a:app     {id:p._app_id})     SET a.id = p._app_id ',
  '   MERGE (m:machine {id:p._machine_id}) SET m.id = p._machine_id ',
  '   MERGE (u)-[:OPENED]->(a) ',
  '   MERGE (a)-[:USERS]->(u) ',
  '   MERGE (u)-[:WORK_IN]->(m) ',
  ' )',
].join("")

This is working, but is so slow. I'm controlling balance by changing simultaneous requests and rows per requests.

With 5 simultaneous requests and 500 rows each, it works with 4 deadlock errors and takes 2 minutes to complete 5000 rows.

The problem is it takes 2-core CPU to 99% (digitalocean 4gb RAM) all the time, and I need scale this to 150 simultaneous requests, not just 5.

I think possible solutions are:

Upvotes: 2

Views: 693

Answers (2)

Srinivas Kattimani
Srinivas Kattimani

Reputation: 336

i dont know about cypher but using neo4j rest api you can easily achieve this using batch insertion , also you can create relationships simultaneously in the same batch opertion with the nodes created in the subsequent jobs in the in the batch.refer to items created earlier .

Upvotes: 0

Jacob Davis-Hansson
Jacob Davis-Hansson

Reputation: 2663

If you only use MERGE with no unique constraints or indexes, Neo4j is required to, for each MERGE, do a full scan of the node space to verify no other item exists with the given property. This means you end up with an algorithmic complexity that is very high, and also disk bound.

You need to create either an index or (preferably) a unique constraint for the label/property tuples that are supposed to be unique in the database. This should significantly speed up your MERGE query.

Also, you'll be better off if you use the new transactional endpoint (http://docs.neo4j.org/chunked/stable/rest-api-transactional.html), which replaces the cypher endpoint. It supports the same features as the cypher endpoint, but improves performance and provides the ability to run multiple cypher statements per HTTP call as well as allowing transactions to remain running on the server for long-running transaction support for the client.

Past that, Neo4j 2.1 is due to be released in the next few months, and it contains several performance enhancements that significantly speed up concurrent query execution. You may want to try the upcoming milestone to see how that helps your performance.

Upvotes: 3

Related Questions