piwa
piwa

Reputation: 39

Neo4J cypher query to find similar graphs

I have several separated graphs in a single database and I am currently searching for a way to get a list of all similar graphs.

For instance, I have the following three graphs:

Graph similarity

As you can see, graph 1 and 2 are similar and graph 3 is different, because the last node of graph 3 has Label_4 and not Label_3 (as it is the case for 1 and 2). Therefore, I would like to get as a result of the query something like:

[a1->b1->c1,a2->b2->c2],[a3->b3->d3]

whereas a1->b1->c1 is graph 1, a2->b2->c2 is graph 2, and a3->b3->d3 is graph 3.

Is there a way to achieve this with Cypher? The representation of the result can also be different, as long as it groups similar graphs (e.g., also a list node IDs or only the start node IDs is fine).

For the creation of the example I used the following commands:

CREATE (a1:Label_1 {name: "Label_1"})
CREATE (b1:Label_2 {name: "Label_2"})
CREATE (c1:Label_3 {name: "Label_3"})
CREATE (a2:Label_1 {name: "Label_1"})
CREATE (b2:Label_2 {name: "Label_2"})
CREATE (c2:Label_3 {name: "Label_3"})
CREATE (a3:Label_1 {name: "Label_1"})
CREATE (b3:Label_2 {name: "Label_2"})
CREATE (d3:Label_4 {name: "Label_4"})
CREATE (a1)-[:FOLLOWS]->(b1)
CREATE (b1)-[:FOLLOWS]->(c1)
CREATE (a2)-[:FOLLOWS]->(b2)
CREATE (b2)-[:FOLLOWS]->(c2)
CREATE (a3)-[:FOLLOWS]->(b3)
CREATE (b3)-[:FOLLOWS]->(d3)

Upvotes: 0

Views: 200

Answers (1)

cybersam
cybersam

Reputation: 67044

If you are: (A) trying to group complete directed graphs (i.e., directed graphs that start at a root node and end at a leaf node), and (B) only interested in using one of the (possibly many) labels for each node, this should work (but, due to the unbounded variable-length relationship, it could take a very long time or run out of memory in large DBs):

MATCH p = (n)-[*]->(m)
WHERE NOT ()-->(n) AND NOT (m)-->()
RETURN [x IN NODES(p) | LABELS(x)[0]] as labelPath, COLLECT(p)

You can remove the (A) constraint by removing the WHERE clause, but then you'd have a much bigger result set (and increase the time to completion and the risk of running out of memory).

Upvotes: 1

Related Questions