Counting distinct undirected paths in Cypher

Question

I'm trying to understand exactly how Cypher returns undirected paths. Here are two queries and their results (they are the same except one uses an undirected pattern):

MATCH path = (a)-[]->(b) RETURN count(path), count(distinct(path))

count(path):1236951

count(distinct(path)):1236951

MATCH path = (a)-[]-(b) RETURN count(path), count(distinct(path))

count(path):2473901

count(distinct(path)):2473901

I have two specific questions about this.

a.) Why is count(path) in the undirected case not exactly twice what it is in the directed case (it's off by 1)?

b.) In the undirected case, why is count(distinct(path)) the same as count(path)? I would have expected Cypher to match each path twice, once in each direction, and then not count the duplicates in count(distinct(path)). Is it not counting the two directions as the same path?

cybersam · Accepted Answer

(a) I do not have an answer for why the path count for your second query is not exactly double that of your first query (other than a possible bug, or data corruption). However, you should be able to get the relationship(s) that do not have the expected count using the query below. That might provide helpful clues.

MATCH path = (a)-[r]-(b)
WITH r, COUNT(*) AS num
WHERE num <> 2
RETURN r;

(b) Each "path" consists of an ordered sequence of nodes separated by relationships. When you traverse a path in opposite directions, the result is 2 different ordered sequences, which are not equivalent. If you had instead counted distinct relationships, as below, you would have gotten an nRels value that is half of nPaths (or close to it, taking into account (a)):

MATCH path = (a)-[r]-(b)
RETURN count(path) AS nPaths, count(distinct r) AS nRels;

Counting distinct undirected paths in Cypher

Answers (2)

Related Questions