Mehran
Mehran

Reputation: 16931

How to ask Neo4j to take cycles into account

It seems Neo4j intentionally omits cycles so a query like:

MATCH (n1)-[:R]->(n2)<-[:R]-(n1)
RETURN n1, n2;

Always returns nothing unless there are two relations with type R between n1 and n2 (which is absolutely possible and a bad hack).

But I have a scenario in which this cycle might happen and it's desired:

MATCH (n1)-[:R]->(n2)<-[:R]-(n3)
RETURN n1, n2, n3;

In this query, n1 and n3 might be the same node or different ones and it all depends on the data (both are valid). One way to implement this is like:

MATCH (n1)-[:R]->(n2)
MATCH (n2)<-[:R]-(n3)
RETURN n1, n2, n3;

This last query will include all the paths even cycles and it's totally fine. But my case is even one level more complicated than this:

MATCH (n0)
OPTIONAL MATCH (n0)-[:R0]->(n1)-[:R]->(n2)<-[:R]-(n3)
RETURN n0, n1, n2, n3;

As we've seen before cycles are omitted in this queries so we have to break it down to two OPTIONAL MATCHes:

MATCH (n0)
OPTIONAL MATCH (n0)-[:R0]->(n1)-[:R]->(n2)
OPTIONAL MATCH (n2)<-[:R]-(n3)
RETURN n0, n1, n2, n3;

But this is not the same as before (if the other one worked). Here the second OPTIONAL MATCH might not return any path while the first one did. In other words, the two OPTIONAL MATCHes are independent.

So my question is: how can I implement the following query and get the cycles in the results?

MATCH (n0)
OPTIONAL MATCH (n0)-[:R0]->(n1)-[:R]->(n2)<-[:R]-(n3)
RETURN n0, n1, n2, n3;

I hope it's not too confusing!

Why two OPTIONAL MATCHes is not the answer

Consider the following nodes and relationships:

  1. CREATE (n0:S)-[:R]->(n1:R)<-[:R]-(n2:E)-[:R]->(n3:R:L);
  2. CREATE (n0:S)-[:R]->(n1:R:L)<-[:R]-(n2:E);
  3. CREATE (n0:S)-[:R]->(n1:R)<-[:R]-(n2:E);

In above examples, here are the meaning of the labels (so you can relate to the problem):

In this example each record of data is represented in :E+:R and when the record is updated, a new :R is added to it. The current state of the data is labeled :L so we can find the latest revision.

Now, among the three given examples, the last one is invalid data since it does not have any :L. The first one has two revisions and the second one has one.

The requested query should:

  1. Return :S regardless
  2. Return all the entities and their latest revision, only if they have a latest revision
  3. An entity without latest revision is meaningless, it should not be returned at all

This query would have returned the requested data if Neo4j supported cycles:

MATCH (n1:S)
OPTIONAL MATCH (n1)-[:R]->(n2:R)<-[:R]-(n3:E)-[:R]->(n4:R:L)
RETURN labels(n1), labels(n3), labels(n4);

The expected results for the above query are:

╒════════════╤════════════╤════════════╕
│"labels(n1)"│"labels(n3)"│"labels(n4)"│
╞════════════╪════════════╪════════════╡
│["S"]       │["E"]       │["R","L"]   │
├────────────┼────────────┼────────────┤
│["S"]       │["E"]       │["R","L"]   │
├────────────┼────────────┼────────────┤
│["S"]       │null        │null        │
└────────────┴────────────┴────────────┘

But the actual results are:

╒════════════╤════════════╤════════════╕
│"labels(n1)"│"labels(n3)"│"labels(n4)"│
╞════════════╪════════════╪════════════╡
│["S"]       │["E"]       │["R","L"]   │
├────────────┼────────────┼────────────┤
│["S"]       │null        │null        │
├────────────┼────────────┼────────────┤
│["S"]       │null        │null        │
└────────────┴────────────┴────────────┘

As you can see the second path is cut short because it included a cycle. Now if we use the two OPTIONAL MATCH approach:

MATCH (n1:S)
OPTIONAL MATCH (n1)-[:R]->(n2:R)<-[:R]-(n3:E)
OPTIONAL MATCH (n3)-[:R]->(n4:R:L)
RETURN labels(n1), labels(n3), labels(n4);

The results would be:

╒════════════╤════════════╤════════════╕
│"labels(n1)"│"labels(n3)"│"labels(n4)"│
╞════════════╪════════════╪════════════╡
│["S"]       │["E"]       │["R","L"]   │
├────────────┼────────────┼────────────┤
│["S"]       │["E"]       │["R","L"]   │
├────────────┼────────────┼────────────┤
│["S"]       │["E"]       │null        │
└────────────┴────────────┴────────────┘

While the second case is fixed, the third case is now the problem and that's because the two optional clauses can exist independently.

Sorry for the long question, I tried to be brief!

Upvotes: 3

Views: 210

Answers (3)

InverseFalcon
InverseFalcon

Reputation: 30417

If I'm understanding the question correctly, this query should give you the expected result table you're looking for:

MATCH (n1:S)
OPTIONAL MATCH (n1)-[:R]->(n2:R)<-[:R]-(n3:E)
WHERE (n3)-[:R]->(:R:L)
OPTIONAL MATCH (n3)-[:R]->(n4:R:L)
RETURN labels(n1), labels(n3), labels(n4)

The key is the WHERE we give to the first optional match. The OPTIONAL MATCH will only match if n3 has the desired path to a :R:L node, and it will count relationships and nodes that were already traversed by the OPTIONAL MATCH.

Upvotes: 1

cybersam
cybersam

Reputation: 67044

[UPDATED]

(A) New answer, based on new info in question.

MATCH (s:S)-[:R]->(firstRev:R)
OPTIONAL MATCH (firstRev)<-[:R]-(e:E)
OPTIONAL MATCH (e)-[:R]->(latestRev:L)
RETURN
  LABELS(s) AS ls,
  CASE WHEN latestRev IS NOT NULL THEN LABELS(e) END AS le,
  LABELS(latestRev) AS ll;

The above query uses 2 OPTIONAL MATCH clauses to allow a cycle (where the 2 R relationships are identical). The CASE clause will return a NULL le value if the second OPTIONAL MATCH failed.

(B) Original answer (obsolete)

Does this work for you?

OPTIONAL MATCH (n1)-[:R]->(n2)
OPTIONAL MATCH (n2)<-[:R]-(n3)
RETURN n1, n2, n3;

Upvotes: 1

stdob--
stdob--

Reputation: 29167

1) You can correct the version with two "option" with adding checks for a match of nodes:

MATCH (n1:S)
OPTIONAL MATCH (n1)-[:R]->(n2:R)<-[:R]-(n3:E)
OPTIONAL MATCH (n3t)-[:R]->(n4:R:L) WHERE n3t = n3
RETURN labels(n1), labels(n3t), labels(n4);

2) I think you are wrong to use the term cycle, and this gives rise to the impression that you have something not right with the data model. On your test data no cycles:

enter image description here

Upvotes: 1

Related Questions