Reputation: 12085
I'm using Property Graph and Cypher from Neo4j. As described in the title, I'm trying to delete a number of nodes which can be reached from another node without passing other nodes and only have 1 incoming relationship. Here is the example of this case:
Each node has its label (big, bold character) and a property called nodeId
, which is unique among nodes. The labels of relationships are omitted because we cannot rely on it for some reasons. The nodeId
property is already indexed with a unique constraint.
Now, starting from node A {nodeId: 1}
, I want to delete it and all other nodes which:
A {nodeId: 1}
without passing another A-label node.So, the nodes will be deleted are: A {nodeId: 1}
, B {nodeId: 3}
, C {nodeId: 4}
, and C {nodeId: 8}
.
Below is my Cypher code:
MATCH p = (s:A {nodeId: 1 }) -[*1..10]-> (e)
WHERE NONE (x in NODES(p) WHERE x:A AND NOT x.nodeId = 1)
WITH s, e
MATCH (e) <-[r]- ()
WITH count(r) AS num_r, s, e
WHERE num_r < 2
DETACH DELETE e
DETACH DELETE s
The code works fine but as my graph grows, it becomes slower and slower. In the beginning, it takes less than 10 ms. But now, when I have around 1 million of nodes and 2 million of relationships, it takes more than 1 second.
What should I do to improve the performance of that code?
Thank you for your help.
Upvotes: 2
Views: 152
Reputation: 30397
Tezra has the right idea for the Cypher version of this query (but without usage of shortestPath()).
An alternate approach that might work better in a more complex graph is to use APOC Procedures, which has path expansion procs that will work well for your use case, finding only single paths to each distinct node, and filtering efficiently on labels.
Here's how you might use this, using apoc.path.subgraphNodes()
MATCH (s:A {nodeId: 1 })
CALL apoc.path.subgraphNodes(s, {maxLevel:10, labelFilter:'-A'}) YIELD node as e
WITH s, e
WHERE size((e)<--()) = 1
DETACH DELETE e
WITH distinct s
DETACH DELETE s
The labelFilter in the procedure call ensures that no node in the expansion has an :A
label (the filter doesn't apply to the starting node of the expansion by default, which works in your case here, though this is configurable).
EDIT
One flaw in this approach, however, is that this expands any relationship, in any direction.
While the relationshipFilter can filter by direction, there's currently a bug which won't allow us to specify only relationship direction without a type.
UPDATE
As of the APOC Summer releases in 2018 (3.3.0.4 along the 3.3.x line, and 3.4.0.2 along the 3.4.x line), you can now specify typeless, direction-only in the relationshipFilter: relationshipFilter:'>'
Upvotes: 1
Reputation: 8833
Since you only care if there is A path, you should use shortestPath instead of just (a)-[*]->(b)
. That way, Cypher just needs to find 1 valid path instead of all possible paths (This can be a life saver in larger sets) Also, you can use TAIL to cut off the first item in a list so that you can (Cypher can) skip that check.
Depending on your Neo4j version, Using MATCH <path> WHERE <stuff> WITH DISTINCT startnode ,endnode
may be more effective, as later Cypher Planners can use the WITH DISTINCT hint to do a faster, less exhaustive path matching. On earlier versions, this will hang Neo4j, and you will need to use the APOC neo4j library.
MATCH (s:A {nodeId: 1 })
WITH s MATCH p=shortestPath((s)-[*1..10]->(e))
WHERE NONE (x in TAIL(NODES(p)) WHERE x:A) AND NOT ()-->(e)<--()
WITH DISTINCT s, e
DETACH DELETE e
DETACH DELETE s
You can also change NOT ()-->(e)<--()
to SIZE(()-->(e)) < 2
if you need to change that number. The former may perform better in some Cypher Planners though. You may need to change that to "All parents of e are contained in path" if that is a scenario where e can have more than 2 incoming relationships but still need to be deleted.
If your logic gets more complicated than that (where what nodes get deleted can change what other nodes can be deleted
Upvotes: 1