Poliakoff
Poliakoff

Reputation: 1672

What is the best way to delete node with connected nodes using spring data neo4j?

There is a tree structure stored in neo4j database. Need to delete a node with all the child nodes. I can propose two approaches so far:

  1. Write a custom query to delete the node and all its children along with their relationships at DB level.
  2. Delete a node itself and run a recursive function at the separate thread for deleting all the children along with their relationship at the application level.

Can one estimate the effectiveness of these approaches and determine which one is better (faster) without the benchmark?

Upvotes: 0

Views: 540

Answers (2)

cybersam
cybersam

Reputation: 66989

Approach #2, which uses threads to concurrently delete nodes/relationships in the same subgraph, is prone to errors, and should be avoided.

When deleting a relationship, neo4j's default locking mechanism will lock the relationship AND its endpoints; this can cause deadlock errors when multiple threads concurrently attempt to delete nodes/relationships in the same subgraph.

Also, a thread may discover that a node/relationship it is attempting to work on has disappeared (due to the actions of other threads).

Here is a sample Cypher query that uses approach #1. It should find all distinct nodes in a Foo/BAR tree and delete the tree (using DETACH DELETE, which @ToreEschliman also suggested):

[EDITED]

MATCH p=(a:Foo {id: 123})-[:BAR*0..]->(b:Foo)
WITH COLLECT(b) AS ns1
UNWIND ns1 AS n
WITH COLLECT(DISTINCT n) AS ns2
FOREACH(y IN ns2 | DETACH DELETE y);

[UPDATE]

Based on new info from the comments, here is how to delete the entire tree rooted at a specific CodeSet node:

MATCH p=(root:CodeSet {id: 123})<-[*0..]-(node)
DETACH DELETE p;

The MATCH pattern used assumes that all the descendant nodes are connected via relationships directed towards the root node.

Upvotes: 1

Tore Eschliman
Tore Eschliman

Reputation: 2507

You can estimate, yes, and you should estimate that the first one is always faster. If you can write a query that identifies all your bad nodes, just do so, then DETACH DELETE those nodes at the end. One transaction, one Cypher translation, and then the rest is handled in specialized, purpose-built database code. If you can come up with a faster way to do it at application level, you should be writing a competing database.

Upvotes: 1

Related Questions