frednotet
frednotet

Reputation: 115

replace nodes (and rels) with cypher

I have a large database with 700K properties. But some are duplicate so I now want to "clean" all this stuff...

I'm looking forward to replace 1+ node(s) by 1+ other(s) and to link all relationships that could exist for this node.

So I would like to do something like :

MATCH (p:Property) WHERE p.uid IN ['A6271DFB-F0FD-0DF1-6F22-67F7D3164AE6']
WITH p AS sources
MATCH (p2:Property) WHERE p2.uid IN ['51A26A14-74FB-BCFC-FE5C-661A43A9377C','8DCD063C-965D-CC12-6159-E287CD000954']
WITH sources, p2 AS destinations
OPTIONAL MATCH (sources)-[k]->(n) MERGE (destinations)-[y]->(n) SET y=k
WITH sources, destinations
OPTIONAL MATCH (sources)<-[s]-(n) MERGE (destinations)<-[w]-(n) SET w=s
WITH sources
DETACH DELETE sources

Off course, it doesn't work because I need to specify the relationship type for the merge... but I actually don't know it (because it could be several types and depending the type, the rel could have properties)...

So right now, I developed a PHP-script to MATCH all nodes+rels; and FOREACH record; generate a MERGE query to link them to the new node... But that's slow and not optimized...

Does anyone have a clue on this ?

Upvotes: 0

Views: 654

Answers (1)

cybersam
cybersam

Reputation: 66999

The following query should work for you. It takes advantage of the APOC plugin and its apoc.refactor.mergeNodes procedure to merge the nodes for you. That procedure is passed a collection of nodes, merging the second through last node (and their relationships) onto the first node before deleting the second through last nodes (and their relationships), and returning the first node.

For illustrative purposes, both uid collections below contain multiple values; in a realworld scenario, both collections should be passed in via parameters.

MATCH (src:Property) WHERE src.uid IN ['A6271DFB-F0FD-0DF1-6F22-67F7D3164AE6','A6271DFB-F0FD-0DF1-6F22-123456789012']
WITH COLLECT(src) AS sources
MATCH (dest:Property) WHERE dest.uid IN ['51A26A14-74FB-BCFC-FE5C-661A43A9377C','8DCD063C-965D-CC12-6159-E287CD000954']
CALL apoc.refactor.mergeNodes([dest] + sources) YIELD node
RETURN node;

The query first collects all source nodes into a sources collection. Then, it matches each destination node and uses the apoc.refactor.mergeNodes procedure to merge all the source nodes into each destination node. (The [dest] + sources syntax produces a new collection combining the dest node and the sources nodes, in that order.) The query returns each (merged) destination node.

Upvotes: 1

Related Questions