Deleting duplicate relationships in neo4j - is this correct?

Question

I have developed a query which, by trial and error, appears to find all of the duplicated relationships in a Neo4j DB. I want delete all but one of these relationships but I'm concerned that I have not thought of problematic cases that could result in data deletion.

So, does this query delete all but one of a duplicated relationship?

MATCH (a)-->(b)<--(a)  # identify where the duplication is present
WITH DISTINCT a, b
MATCH (a)-[r]->(b)  # get all duplicated paths themselves
WITH a, b, collect(r)[1..] as rs  # remove the first instance from the list
UNWIND rs as r
DELETE r

If I replace the UNWIND rs as r; DELETE r with WITH a, b, count(rs) as cnt RETURN cnt it seems to return the unnecessary relationships.

I'm still relucant to put this somewhere to be used by others, though....

Thanks

cybersam · Accepted Answer

First of all, let me (strictly) define the term: "duplicate relationships". Two relationships are duplicates if they:

Connect the same pair of nodes (call them a and b)
Have the same relationship type
Have exactly the same set of properties (both names and values)
Have the same directionality between a and b (iff directionality is significant for use case)

Your query only considers #1 and #4, so it generally could delete non-duplicate relationships as well.

Here is a query that will take all of the above into consideration (assuming #4 should be included):

MATCH (a)-[r1]->(b)<-[r2]-(a)
WHERE TYPE(r1) = TYPE(r2) AND PROPERTIES(r1) = PROPERTIES(r2)
WITH a, b, apoc.coll.union(COLLECT(r1), COLLECT(r2))[1..] AS rs
UNWIND rs as r
DELETE r

Aggregating functions (like COLLECT) use non-aggregated terms as grouping keys, so there is no need for the query to perform a separate redundant DISTINCT a,b test.

The APOC function apoc.coll.union returns the distinct union of its 2 input lists.

Deleting duplicate relationships in neo4j - is this correct?

Answers (1)

Related Questions