Reputation: 1293
I have duplicate relationships between nodes e.g:
A ->{weight: 1} B
A ->{weight: 1} B
A ->{weight: 1} B
and I want to merge these relations into one relation of the form: A->{weight: 3} B for my whole graph.
I tried something like the following:
start n = node(*)
match (n)-[r:OCCURENCE]->()
Set r.weight = count(*)
count(*)
But my graph is really big and with this query edges are updated twice for each node A and B. Furthermore the old relationships are not deleted. Don't know how to model these two aspects in one query. Hope someone can help.
EDIT:
Tried some other querys with node() and relationship() e.g
start n = node(*) match ()-[r:OCCURENCE]->() set n.SumEdgeWeight = sum(r.weight)
They are processing horrible slow. Is there any other faster way when I need to update all nodes? I found this topic [1] in the Neo4j community. Is it possible that my querys run with the java core api faster?
[1] https://groups.google.com/forum/#!topic/neo4j/4SSxvNsuQsY
Regards.
Upvotes: 3
Views: 5151
Reputation: 9952
Instead of starting with a very general pattern that matches each node (node(*)
) you can start with the more specific pattern that you are after (A-[:OCCURRENCE]->B
). This might speed things up a bit.
Instead of counting nodes to arrive at an aggregate weight you can aggregate the weight value (you seem to move towards that in your edit, but you are setting the weight aggregate as a property on a node). Maybe with your data all the relationships have a weight of 1, if so some kind of counting could work (you could try counting the relationships instead of the nodes), but it might be worth having a query that doesn't produce the right result accidentally. Such a query would work also with varying weight values, for instance if you import more data in the future and need to merge new [OCCURRENCE]
relationships, perhaps with a weight of 1, with ones that are already merged and in place.
Could you try something like this?
MATCH (A)-[r:OCCURRENCE]->(B)
WITH A, COLLECT(r) as oldRels, B, SUM(r.weight) as W
FOREACH(r IN oldRels | DELETE r)
WITH A, W, B
CREATE (A)-[O:OCCURRENCE {weight:W}]->(B);
I take this query to mean something like: For all A-[r:OCCURRENCE]->B
patterns in the graph, COLLECT
the relationships and bring that collection WITH
so they can be deleted later. Also bring WITH
the related nodes and the SUM
of the relationships' weight. FOREACH
of the old relationships, delete it, and bring WITH
only the two nodes and the aggregated weight. Create a new relationship and set the weight to the aggregated weight.
Upvotes: 9
Reputation: 459
Though this is an old question, there is some new apoc functionality that can be used here. You need to install the apoc plugin for your version of neo.
MATCH (A)-[r:OCCURRENCE]->(B)
WITH A,B,collect(distinct(r.weight)) as values, count(r) as relsCount
MATCH (A)-[r:OCCURRENCE]->(B)
WHERE size(values) = 1 AND relsCount > 1
WITH A,B,collect(r) as rels
CALL apoc.refactor.mergeRelationships(rels,{properties:"combine"})
YIELD rel RETURN rel
the "combine" property returns the weights of each duplication relationship in an array which you can sum. Or you can add the sum to the relationships as per previous example first then remove this property.
More documentation here
Upvotes: 2