Reputation: 1288
I would like to clean up my graph database a bit by removing unnecessary nodes. In one case, unnecessary nodes are nodes B between nodes A and C where B has the same name as node C and NO OTHER incoming relationships. I am having trouble coming up with a Cypher query that restricts the number of incoming edges.
The first part was easy:
MATCH (n1:TypeA)<-[r1:Inside]-(n2:TypeB)<-[r2:Inside]-(n3:TypeC)
WHERE n2.name = n3.name
Based on other SE questions (especially this one) I then tried doing something like:
WITH n2, collect(r2) as rr
WHERE length(rr) = 1
RETURN n2
but this also returned nodes with more than one incoming edge. It seems my WHERE
clause on the length is not filtering the returned n2
nodes. I tried a few other things I found online, but they either returned nothing or were no
longer syntactically correct in the current version.
After I find the n2
nodes that match the pattern, I'll want to connect n3
directly to n1
and DETACH DELETE n2
. Again, I was easily able to do that part when I didn't need the restriction on the number of incoming edges to n2
. That previous question has FOREACH (r IN rr | DELETE r)
, but I want to detach delete the n2
nodes, not just those edges. I don't know how to correctly adapt this to operating on the nodes attached to the r
s and I certainly want to be sure it's finding the correct nodes before deleting anything since Neo4j lacks basic undo functionality (but you can't put a RETURN
command inside a FOREACH
for some crazy reason).
How do I filter nodes on a path by the number of incoming edges using Cypher?
I think I can do this in py2neo by first collecting all the n1,n2,n3
triples matching the pattern, then going through each returned record and add them to a list if n2
has only one incoming edge. Then go through that list and perform the trimming operation, but if this can be done in pure Cypher, then I'd like to know how because I have a number of similar adjustments to make.
Upvotes: 0
Views: 771
Reputation: 41676
You need to pass along path
in your WITH
statement.
MATCH path = (n1:Ward)<-[r1:PARTOF]-(n2:Unknown)<-[r2:PARTOF]-(n3:Chome)
WHERE n2.name = n3.name
WITH path, size((n2)<-[:PARTOF]-()) as degree
WHERE degree = 1
RETURN path
Or shorter like this:
MATCH path = (n1:Ward)<-[r1:PARTOF]-(n2:Unknown)<-[r2:PARTOF]-(n3:Chome)
WHERE n2.name = n3.name
AND size((n2)<-[:PARTOF]-()) = 1
RETURN path
Upvotes: 1
Reputation: 1288
Borrowing some insight from this answer I came up with a kludge that seems to work.
MATCH path = (n1:Ward)<-[r1:PARTOF]-(n2:Unknown)<-[r2:PARTOF]-(n3:Chome)
WHERE n2.name = n3.name
WITH n2, size((n2)<-[:PARTOF]-()) as degree
WHERE degree = 1
MATCH (n1:Ward)<-[r1:PARTOF]-(n2:Unknown)<-[r2:PARTOF]-(n3:Chome)
RETURN n1,n2,n3
I expect that not all parts of that are needed and it isn't an efficient solution, but I don't know enough yet to improve upon it.
For example, I define the first line as path
, but I can't use MATCH path
in the penultimate line, and I have no idea why.
Also, if I write WITH size((n2)<-[:PARTOF]-()) as degree
(dropping the n2,
after WITH) it returns not only the n2
with degree > 1, but also all the nodes connected to them (even more than the n3
nodes). I don't know why it behaves like this and the Neo4j documentation for WITH has no examples or description to help me understand why the n2
is necessary here. Any improvements to my Cypher query or explanation of how or why it works would be greatly appreciated.
Upvotes: 0