Reputation: 4545
I am trying to write a query that retrieves all paths that are reachable from a specified vertex. In other words I am trying to retrieve the entire cluster/sub-graph that the vertex is connected to. A couple more constraints on the query are:
I currently have the following two queries which appear to work expected on small, toy graphs I have tested them on. However, there seem to be some edge cases in our large, production graph that does not return all the paths/edges/vertices I would expect it to, but I cannot explain as to why this happens. The two queries also sometimes return some different vertices than each other.
I would prefer a fresh view on how to approach this query, rather than trying to adjust what I currently have, so please try to provide a solution before looking at my current solution below.
Query 1:
g.V(uid).repeat(bothE().bothV().simplePath()).until(loops().is_(10)).emit().dedup().path().by(valueMap(True))
Query 2:
g.V(uid).repeat(bothE().bothV().simplePath()).until(bothE().simplePath().count().is_(0).or_().loops().is_(10)).dedup().path().by(valueMap(True))
Upvotes: 0
Views: 1313
Reputation: 14371
Using this simple binary tree as a test graph
g.addV('root').property('data',9).as('root').
addV('node').property('data',5).as('b').
addV('node').property('data',2).as('c').
addV('node').property('data',11).as('d').
addV('node').property('data',15).as('e').
addV('node').property('data',10).as('f').
addV('node').property('data',1).as('g').
addV('node').property('data',8).as('h').
addV('node').property('data',22).as('i').
addV('node').property('data',16).as('j').
addV('node').property('data',7).as('k').
addV('node').property('data',51).as('l').
addV('node').property('data',13).as('m').
addV('node').property('data',4).as('n').
addE('left').from('root').to('b').
addE('left').from('b').to('c').
addE('right').from('root').to('d').
addE('right').from('d').to('e').
addE('right').from('e').to('i').
addE('left').from('i').to('j').
addE('left').from('d').to('f').
addE('right').from('b').to('h').
addE('left').from('h').to('k').
addE('right').from('i').to('l').
addE('left').from('e').to('m').
addE('right').from('c').to('n').
addE('left').from('c').to('g').iterate()
We could find all the paths using
gremlin> g.V().hasLabel('root').
......1> repeat(bothE().otherV().simplePath()).
......2> until(__.not(bothE().simplePath())).
......3> path().
......4> by('data').
......5> by(label)
==>[9,right,11,left,10]
==>[9,left,5,left,2,left,1]
==>[9,left,5,left,2,right,4]
==>[9,left,5,right,8,left,7]
==>[9,right,11,right,15,left,13]
==>[9,right,11,right,15,right,22,left,16]
==>[9,right,11,right,15,right,22,right,51]
Note that I used bothE().otherV()
as you said in your case you may have some incoming edges as well as outgoing ones.
We could also use the subgraph
step to return the whole sub graph containing both vertices and edges. This example finds the subtree that starts at the vertex for the value 5.
gremlin> g.V().has('data',5).
......1> repeat(bothE().subgraph('sg').otherV().simplePath()).
......2> until(__.not(bothE().simplePath())).
......3> cap('sg')
==>tinkergraph[vertices:14 edges:13]
Note that both of these approaches assumes that all paths end at leaf nodes. I left out the loops()
test but you can add that in as needed.
Upvotes: 2