Reputation: 33809
We have a Cosmos graph database like below. ie. A,B,C,... are nodes/vertices and edges are as shown by the arrows.
Each node/vertex represents a value in SQL table. Process and the requirement are as follows.
I tried the following query and it has over 3000 RUs which is very costly.
g.V("A").emit().repeat(__.in('depends')).until(__.inE().count().is(0))
We need some help to optimise the query. thanks
UPDATE ===========
OK, we can rebuild the graph in a single partition to reduce the RUs but we have a scenario where multiple nodes are affected, highlighted in red in the below picture, on the way up starting from A.
Can someone help with a query to get the results in A, D, K, O, M, P order please? Logic to the query is all the child nodes should be listed before their parents
g.addV('ddn').property('pk', 'pk').property(id, 'A').property('formula', 'A').
addV('ddn').property('pk', 'pk').property(id, 'B').property('formula', 'B').
addV('ddn').property('pk', 'pk').property(id, 'C').property('formula', 'C').
addV('ddn').property('pk', 'pk').property(id, 'D').property('formula', 'A+B+C').property('requires', "'A','B','C'").
addV('ddn').property('pk', 'pk').property(id, 'E').property('formula', 'E').
addV('ddn').property('pk', 'pk').property(id, 'F').property('formula', 'E').
addV('ddn').property('pk', 'pk').property(id, 'G').property('formula', 'H+I').property('requires', "'H','I'").
addV('ddn').property('pk', 'pk').property(id, 'H').property('formula', 'H').
addV('ddn').property('pk', 'pk').property(id, 'I').property('formula', 'I').
addV('ddn').property('pk', 'pk').property(id, 'J').property('formula', 'F+G').property('requires', "'F','G'").
addV('ddn').property('pk', 'pk').property(id, 'K').property('formula', 'D+E+F').property('requires', "'D','E','F'").
addV('ddn').property('pk', 'pk').property(id, 'L').property('formula', 'L').
addV('ddn').property('pk', 'pk').property(id, 'M').property('formula', 'J+K').
addV('ddn').property('pk', 'pk').property(id, 'N').property('formula', 'N').
addV('ddn').property('pk', 'pk').property(id, 'O').property('formula', 'A+K').property('requires', "'A','K'").
addV('ddn').property('pk', 'pk').property(id, 'P').property('formula', 'L+M+N+O').property('requires', "'L','M','N','O'").
V('D').addE('needs').to(V('A')).
V('D').addE('needs').to(V('B')).
V('D').addE('needs').to(V('C')).
V('G').addE('needs').to(V('H')).
V('G').addE('needs').to(V('I')).
V('K').addE('needs').to(V('D')).
V('K').addE('needs').to(V('E')).
V('K').addE('needs').to(V('F')).
V('J').addE('needs').to(V('F')).
V('J').addE('needs').to(V('G')).
V('O').addE('needs').to(V('A')).
V('O').addE('needs').to(V('K')).
V('M').addE('needs').to(V('J')).
V('M').addE('needs').to(V('K')).
V('P').addE('needs').to(V('L')).
V('P').addE('needs').to(V('M')).
V('P').addE('needs').to(V('N')).
V('P').addE('needs').to(V('O'))
Upvotes: 1
Views: 732
Reputation: 46206
I think the answer boils down to being able to sort the vertices traversed by their path length.
gremlin> g.V("A").
......1> emit().repeat(__.in('needs')).path().
......2> group().
......3> by(tail(local)).
......4> by(count(local).fold()).
......5> order(local).
......6> by(select(values).tail(local)).
......7> select(keys)
==>[v[A],v[D],v[K],v[M],v[O],v[P]]
I group()
by the last element in the path() and transform each path in the group to its length with count(local)
. That allows me to order()
the results by the longest path for each vertex.
Note that I don't think you need until(__.inE().count().is(0))
because you're just traversing to path exhaustion in either case. Also, take care with __.inE().count().is(0)
as you end up counting all the edges just to detect a count of zero. Most graphs should optimize that to just until(inE())
, but it's always best to be explicit in my opinion. That said, you need to be sure of your data structures when using repeat()
- it takes just one edge of bad data to send your traversal into an infinity of traversing. Consider some kind of upper bound to your repeat()
that makes sense for your data so that the loop will terminate at some point.
Here is an alternative which actually might be better since it doesn't bother to hold all the counts in the Map
after the group()
:
gremlin> g.V("A").
......1> emit().repeat(__.in('needs')).path().
......2> group().
......3> by(tail(local)).
......4> by(count(local).order(local).tail(local)).
......5> order(local).
......6> by(values).
......7> select(keys)
==>[v[A],v[D],v[K],v[M],v[O],v[P]]
Upvotes: 2