Reputation: 29655
I have a collection of nodes that make up a DAG (directed acyclic graph) with no loops guaranteed. I want to store the nodes in a database and have the database execute a search that shows me all paths between two nodes.
For example, you could think that I have the git history of a complex project.
Each node can be described with a JSON object that has:
{'id':'id',
'outbound':['id1','id2','id3']}
}
So if I had these nodes in the database:
{'id':'id0',
'outbound':['id1','id2']}
}
{'id':'id1',
'outbound':['id2','id3','id4','id5,'id6']}
}
{'id':'id2',
'outbound':['id2','id3'}
}
And if I wanted to know all of the paths connecting id0
and id3
, I would want to get three lists:
id0 -> id1 -> id3
id0 -> id2 -> id3
id0 -> id1 -> id2 -> id3
I have thousands of these nodes today, I will have tens of thousands of them tomorrow. However, there are many DAGs in the database, and the typical DAG only has 5-10 nodes, so this problem is tractable.
I believe that there is no way to do this efficiently MySQL (right now all of the objects are stored in a table in a JSON column), however I believe that it is possible to do it efficiently in a graph database like Neo4j.
I've looked at the Neo4J documentation on Path Finding Algorithms and perhaps I'm confused, but the examples don't really look like working examples. I found a MySQL example which uses stored procedures and it doesn't look like it parallelizes very well. I'm not even sure what Amazon Neptune is doing; I think that it is using Spark GraphX.
I'm sort of lost as to where to start on this.
Upvotes: 0
Views: 896
Reputation: 10681
It's perfectly doable with Neo4j.
Importing json data
[
{"id":"id0",
"outbound":["id1","id2"]
},
{"id":"id1",
"outbound":["id2","id3","id4","id5","id6"]
},
{"id":"id2",
"outbound":["id2","id3"]
}
]
CALL apoc.load.json("graph.json")
YIELD value
MERGE (n:Node {id: value.id})
WITH n, value.outbound AS outbound
UNWIND outbound AS o
MERGE (n2:Node {id: o})
MERGE (n)-[:Edge]->(n2)
Apparently the data you provided is not acyclic...
Getting all paths between two nodes
As you are not mentioning shortest paths, but all paths, there is no specific algorithm required:
MATCH p=(:Node {id: "id0"})-[:Edge*]->(:Node {id: "id3"}) RETURN nodes(p)
"[{""id"":id0},{""id"":id1},{""id"":id3}]"
"[{""id"":id0},{""id"":id2},{""id"":id3}]"
"[{""id"":id0},{""id"":id1},{""id"":id2},{""id"":id3}]"
"[{""id"":id0},{""id"":id2},{""id"":id2},{""id"":id3}]"
"[{""id"":id0},{""id"":id1},{""id"":id2},{""id"":id2},{""id"":id3}]"
Comparaison with MySql
See how-much-faster-is-a-graph-database-really
Upvotes: 1
Reputation: 6514
The Graph Data Science library pathfinding algorithms are designed to find the shortest weighted paths and use algorithms similar to Dijkstra to find them. In your case, it seems that you are dealing with a directed unweighted graph and you could use the native cypher allShortestPath
procedure:
An example would be:
MATCH (n1:Node{id:"A"}),(n2:Node{id:"B"})
MATCH path=allShortestPaths((n1)-[*..10]->(n2))
RETURN [n in nodes(path) | n.id] as outbound_nodes_id
It is always useful to check the Cypher refcard to see what is available with Cypher in Neo4j
Upvotes: 0