In Neo4j, auto-generating relationships caused duplicate results later

Question

In the following code I manually create 3 nodes, then three directed relationships between them. When I query for all possible 'directed' combinations I get what I expect: 7 combinations. Here is the code you can cut & paste into a Neo4j browser...

CREATE (a:Component {name:'A'})
CREATE (b:Component {name:'B'})
CREATE (c:Component {name:'C'})

CREATE (a)-[:CanExistWith]->(b),  
       (a)-[:CanExistWith]->(c),  
       (b)-[:CanExistWith]->(c)

WITH a,b,c
MATCH p = (:Component)-[*0..]->(:Component)
RETURN EXTRACT(n IN nodes(p)| n.name) AS component_sets

..and the correct result of 7 sets:

[A], [B], [C], [A,B], [A,C], [B,C], [A,B,C]

So that works fine; and with only 3 components (nodes), it's doable.

But if the graph had 20 components, I would have to manually create more than a million combined sets of relationships. Of course the REST client would not be able to handle that anyway.
That's okay, Neo4j can automate that part. So let's keep the number of nodes to 3 and change out that middle chunk of code from manually creating relationships to auto-generating them using the MATCH + CREATE UNIQUE clause.

CREATE (a:Component  {name:'A'})
CREATE (b:Component  {name:'B'})
CREATE (c:Component  {name:'C'})

WITH a,b,c    
MATCH (x:Component ), (y:Component )
WHERE id(x) < id(y)
CREATE UNIQUE (x)-[r:CanExistWith]->(y)

WITH x,y
MATCH p = (:Component )-[*0..]->(:Component )
RETURN EXTRACT(n IN nodes(p)| n.name) AS component_sets

If you run this and you look at the visual graph this creates in the Neo4j browser, it's visually identical to the one above. They have the same number of nodes, and relationships, with arrows pointing in the correct directions.

Relationships auto-generated via MATCH + CREATE UNIQUE

But this second graph actually behaves differently. When I query it for all possible unique directed combinations, i get duplications:

[A], [A], [A], [B], [B], [B], [C], [C], [C], 
[A, B], [A, B], [A, C], [A, C], [A, C], [B, C],
[A, B, C]

There are 16 sets instead of 7. I know that I could use DISTINCT to clean up, but I didn't have to in the first example, and the number of duplicates explodes as the node count increases. DISTINCT should not be necessary here because the path selections, and any pruning, should be able to happen at MATCH time. And I would expect that having no duplicates generated would mean more efficient CYPHER code.

So the question is: How can I change my graph structure or the auto-relationship-building-query to give me the same result as the first example?

(I am using Neo4j version 2.3.2)

In Neo4j, auto-generating relationships caused duplicate results later

Answers (1)

Related Questions