mojo2go
mojo2go

Reputation: 107

In Neo4j, auto-generating relationships caused duplicate results later

In the following code I manually create 3 nodes, then three directed relationships between them. When I query for all possible 'directed' combinations I get what I expect: 7 combinations. Here is the code you can cut & paste into a Neo4j browser...

CREATE (a:Component {name:'A'})
CREATE (b:Component {name:'B'})
CREATE (c:Component {name:'C'})

CREATE (a)-[:CanExistWith]->(b),  
       (a)-[:CanExistWith]->(c),  
       (b)-[:CanExistWith]->(c)

WITH a,b,c
MATCH p = (:Component)-[*0..]->(:Component)
RETURN EXTRACT(n IN nodes(p)| n.name) AS component_sets

..and the correct result of 7 sets:

[A], [B], [C], [A,B], [A,C], [B,C], [A,B,C]

So that works fine; and with only 3 components (nodes), it's doable.
Relationships individually specified by hand

But if the graph had 20 components, I would have to manually create more than a million combined sets of relationships. Of course the REST client would not be able to handle that anyway.
That's okay, Neo4j can automate that part. So let's keep the number of nodes to 3 and change out that middle chunk of code from manually creating relationships to auto-generating them using the MATCH + CREATE UNIQUE clause.

CREATE (a:Component  {name:'A'})
CREATE (b:Component  {name:'B'})
CREATE (c:Component  {name:'C'})

WITH a,b,c    
MATCH (x:Component ), (y:Component )
WHERE id(x) < id(y)
CREATE UNIQUE (x)-[r:CanExistWith]->(y)

WITH x,y
MATCH p = (:Component )-[*0..]->(:Component )
RETURN EXTRACT(n IN nodes(p)| n.name) AS component_sets

If you run this and you look at the visual graph this creates in the Neo4j browser, it's visually identical to the one above. They have the same number of nodes, and relationships, with arrows pointing in the correct directions.
Relationships auto-generated via MATCH + CREATE UNIQUE

But this second graph actually behaves differently. When I query it for all possible unique directed combinations, i get duplications:

[A], [A], [A], [B], [B], [B], [C], [C], [C], 
[A, B], [A, B], [A, C], [A, C], [A, C], [B, C],
[A, B, C]

There are 16 sets instead of 7. I know that I could use DISTINCT to clean up, but I didn't have to in the first example, and the number of duplicates explodes as the node count increases. DISTINCT should not be necessary here because the path selections, and any pruning, should be able to happen at MATCH time. And I would expect that having no duplicates generated would mean more efficient CYPHER code.

So the question is: How can I change my graph structure or the auto-relationship-building-query to give me the same result as the first example?

(I am using Neo4j version 2.3.2)

Upvotes: 1

Views: 372

Answers (1)

mojo2go
mojo2go

Reputation: 107

The problem is with the 'WITH'. The 'WITH' keyword does 2 things; it connects consecutive queries, and carries forward variables from one query to the next. In this case there are no variables that need to be to carried forward. And it seems that it doesn't matter which variables are included with it: 'WITH x,y', or just 'WITH x' or 'WITH x,r,y' or just 'WITH r'. All of those produce 16 rows of data, of mostly duplicate rows.

The trick in this case is to leave out the 'WITH' clause entirely. That clause is somehow causing the duplication. It won't run as one script, but the results are perfect! So here are the two scripts. Nothing is changed in the code, other than the missing 'WITH x,y'.

first run this:

CREATE (a:Component  {name:'A'})
CREATE (b:Component  {name:'B'})
CREATE (c:Component  {name:'C'})

WITH a,b,c    
MATCH (x:Component ), (y:Component )
WHERE id(x) < id(y)
CREATE UNIQUE (x)-[r:CanExistWith]->(y)

...then this:

MATCH p = (:Component )-[*0..]->(:Component )
RETURN EXTRACT(n IN nodes(p)| n.name) AS component_sets

..and it produces the correct result of 7 sets:

[A], [B], [C], [A,B], [A,C], [B,C], [A,B,C]

Upvotes: 1

Related Questions