Reputation: 1243
I have developed a query which, by trial and error, appears to find all of the duplicated relationships in a Neo4j DB. I want delete all but one of these relationships but I'm concerned that I have not thought of problematic cases that could result in data deletion.
So, does this query delete all but one of a duplicated relationship?
MATCH (a)-->(b)<--(a) # identify where the duplication is present
WITH DISTINCT a, b
MATCH (a)-[r]->(b) # get all duplicated paths themselves
WITH a, b, collect(r)[1..] as rs # remove the first instance from the list
UNWIND rs as r
DELETE r
If I replace the UNWIND rs as r; DELETE r
with WITH a, b, count(rs) as cnt RETURN cnt
it seems to return the unnecessary relationships.
I'm still relucant to put this somewhere to be used by others, though....
Thanks
Upvotes: 0
Views: 778
Reputation: 67044
First of all, let me (strictly) define the term: "duplicate relationships". Two relationships are duplicates if they:
a
and b
)a
and b
(iff directionality is significant for use case)Your query only considers #1 and #4, so it generally could delete non-duplicate relationships as well.
Here is a query that will take all of the above into consideration (assuming #4 should be included):
MATCH (a)-[r1]->(b)<-[r2]-(a)
WHERE TYPE(r1) = TYPE(r2) AND PROPERTIES(r1) = PROPERTIES(r2)
WITH a, b, apoc.coll.union(COLLECT(r1), COLLECT(r2))[1..] AS rs
UNWIND rs as r
DELETE r
Aggregating functions (like COLLECT
) use non-aggregated terms as grouping keys
, so there is no need for the query to perform a separate redundant DISTINCT a,b
test.
The APOC function apoc.coll.union returns the distinct union of its 2 input lists.
Upvotes: 1