vy32
vy32

Reputation: 29655

Find all possible paths between two nodes on graph using a graph database

I have a collection of nodes that make up a DAG (directed acyclic graph) with no loops guaranteed. I want to store the nodes in a database and have the database execute a search that shows me all paths between two nodes.

For example, you could think that I have the git history of a complex project.

Each node can be described with a JSON object that has:

  {'id':'id',
   'outbound':['id1','id2','id3']}
  }

So if I had these nodes in the database:

  {'id':'id0',
   'outbound':['id1','id2']}
  }

  {'id':'id1',
   'outbound':['id2','id3','id4','id5,'id6']}
  }

  {'id':'id2',
   'outbound':['id2','id3'}
  }

And if I wanted to know all of the paths connecting id0 and id3, I would want to get three lists:

   id0 -> id1 -> id3
   id0 -> id2 -> id3
   id0 -> id1 -> id2 -> id3

I have thousands of these nodes today, I will have tens of thousands of them tomorrow. However, there are many DAGs in the database, and the typical DAG only has 5-10 nodes, so this problem is tractable.

I believe that there is no way to do this efficiently MySQL (right now all of the objects are stored in a table in a JSON column), however I believe that it is possible to do it efficiently in a graph database like Neo4j.

I've looked at the Neo4J documentation on Path Finding Algorithms and perhaps I'm confused, but the examples don't really look like working examples. I found a MySQL example which uses stored procedures and it doesn't look like it parallelizes very well. I'm not even sure what Amazon Neptune is doing; I think that it is using Spark GraphX.

I'm sort of lost as to where to start on this.

Upvotes: 0

Views: 896

Answers (2)

Mafor
Mafor

Reputation: 10681

It's perfectly doable with Neo4j.

Importing json data

[
  {"id":"id0",
   "outbound":["id1","id2"]
  },
  {"id":"id1",
   "outbound":["id2","id3","id4","id5","id6"]
  },
  {"id":"id2",
   "outbound":["id2","id3"]
  }
]
CALL apoc.load.json("graph.json") 
YIELD value
MERGE (n:Node {id: value.id})
WITH n, value.outbound AS outbound
UNWIND outbound AS o
MERGE (n2:Node {id: o}) 
MERGE (n)-[:Edge]->(n2)

enter image description here

Apparently the data you provided is not acyclic...

Getting all paths between two nodes

As you are not mentioning shortest paths, but all paths, there is no specific algorithm required:

MATCH p=(:Node {id: "id0"})-[:Edge*]->(:Node {id: "id3"}) RETURN nodes(p)
"[{""id"":id0},{""id"":id1},{""id"":id3}]"
"[{""id"":id0},{""id"":id2},{""id"":id3}]"
"[{""id"":id0},{""id"":id1},{""id"":id2},{""id"":id3}]"
"[{""id"":id0},{""id"":id2},{""id"":id2},{""id"":id3}]"
"[{""id"":id0},{""id"":id1},{""id"":id2},{""id"":id2},{""id"":id3}]"

Comparaison with MySql

See how-much-faster-is-a-graph-database-really

Upvotes: 1

Tomaž Bratanič
Tomaž Bratanič

Reputation: 6514

The Graph Data Science library pathfinding algorithms are designed to find the shortest weighted paths and use algorithms similar to Dijkstra to find them. In your case, it seems that you are dealing with a directed unweighted graph and you could use the native cypher allShortestPath procedure:

An example would be:

MATCH (n1:Node{id:"A"}),(n2:Node{id:"B"})
MATCH path=allShortestPaths((n1)-[*..10]->(n2))
RETURN [n in nodes(path) | n.id] as outbound_nodes_id

It is always useful to check the Cypher refcard to see what is available with Cypher in Neo4j

Upvotes: 0

Related Questions