Find all possible paths between two nodes on graph using a graph database

Question

I have a collection of nodes that make up a DAG (directed acyclic graph) with no loops guaranteed. I want to store the nodes in a database and have the database execute a search that shows me all paths between two nodes.

For example, you could think that I have the git history of a complex project.

Each node can be described with a JSON object that has:

  {'id':'id',
   'outbound':['id1','id2','id3']}
  }

So if I had these nodes in the database:

  {'id':'id0',
   'outbound':['id1','id2']}
  }

  {'id':'id1',
   'outbound':['id2','id3','id4','id5,'id6']}
  }

  {'id':'id2',
   'outbound':['id2','id3'}
  }

And if I wanted to know all of the paths connecting id0 and id3, I would want to get three lists:

   id0 -> id1 -> id3
   id0 -> id2 -> id3
   id0 -> id1 -> id2 -> id3

I have thousands of these nodes today, I will have tens of thousands of them tomorrow. However, there are many DAGs in the database, and the typical DAG only has 5-10 nodes, so this problem is tractable.

I believe that there is no way to do this efficiently MySQL (right now all of the objects are stored in a table in a JSON column), however I believe that it is possible to do it efficiently in a graph database like Neo4j.

I've looked at the Neo4J documentation on Path Finding Algorithms and perhaps I'm confused, but the examples don't really look like working examples. I found a MySQL example which uses stored procedures and it doesn't look like it parallelizes very well. I'm not even sure what Amazon Neptune is doing; I think that it is using Spark GraphX.

I'm sort of lost as to where to start on this.

Mafor · Accepted Answer

It's perfectly doable with Neo4j.

Importing json data

[
  {"id":"id0",
   "outbound":["id1","id2"]
  },
  {"id":"id1",
   "outbound":["id2","id3","id4","id5","id6"]
  },
  {"id":"id2",
   "outbound":["id2","id3"]
  }
]

CALL apoc.load.json("graph.json") 
YIELD value
MERGE (n:Node {id: value.id})
WITH n, value.outbound AS outbound
UNWIND outbound AS o
MERGE (n2:Node {id: o}) 
MERGE (n)-[:Edge]->(n2)

Apparently the data you provided is not acyclic...

Getting all paths between two nodes

As you are not mentioning shortest paths, but all paths, there is no specific algorithm required:

MATCH p=(:Node {id: "id0"})-[:Edge*]->(:Node {id: "id3"}) RETURN nodes(p)

"[{""id"":id0},{""id"":id1},{""id"":id3}]"
"[{""id"":id0},{""id"":id2},{""id"":id3}]"
"[{""id"":id0},{""id"":id1},{""id"":id2},{""id"":id3}]"
"[{""id"":id0},{""id"":id2},{""id"":id2},{""id"":id3}]"
"[{""id"":id0},{""id"":id1},{""id"":id2},{""id"":id2},{""id"":id3}]"

Comparaison with MySql

See how-much-faster-is-a-graph-database-really

Find all possible paths between two nodes on graph using a graph database

Answers (2)

Related Questions