Gremlin query to get affected nodes from the Cosmos graph

Question

We have a Cosmos graph database like below. ie. A,B,C,... are nodes/vertices and edges are as shown by the arrows.

Each node/vertex represents a value in SQL table. Process and the requirement are as follows.

User modifies node A value in SQL table
Gremlin query passes A into the Graph
Graph returns the following vertices in the below listed order
C# app calculates the values of D,K,M,P nodes in the order and updates SQL table

D = A+B+C
K = F+E+D
M = J+K
P = L+M+N+O

I tried the following query and it has over 3000 RUs which is very costly.

g.V("A").emit().repeat(__.in('depends')).until(__.inE().count().is(0))

We need some help to optimise the query. thanks

UPDATE ===========

OK, we can rebuild the graph in a single partition to reduce the RUs but we have a scenario where multiple nodes are affected, highlighted in red in the below picture, on the way up starting from A.

Can someone help with a query to get the results in A, D, K, O, M, P order please? Logic to the query is all the child nodes should be listed before their parents

g.addV('ddn').property('pk', 'pk').property(id, 'A').property('formula', 'A').
addV('ddn').property('pk', 'pk').property(id, 'B').property('formula', 'B').
addV('ddn').property('pk', 'pk').property(id, 'C').property('formula', 'C').
addV('ddn').property('pk', 'pk').property(id, 'D').property('formula', 'A+B+C').property('requires', "'A','B','C'").
addV('ddn').property('pk', 'pk').property(id, 'E').property('formula', 'E').
addV('ddn').property('pk', 'pk').property(id, 'F').property('formula', 'E').
addV('ddn').property('pk', 'pk').property(id, 'G').property('formula', 'H+I').property('requires', "'H','I'").
addV('ddn').property('pk', 'pk').property(id, 'H').property('formula', 'H').
addV('ddn').property('pk', 'pk').property(id, 'I').property('formula', 'I').
addV('ddn').property('pk', 'pk').property(id, 'J').property('formula', 'F+G').property('requires', "'F','G'").
addV('ddn').property('pk', 'pk').property(id, 'K').property('formula', 'D+E+F').property('requires', "'D','E','F'").
addV('ddn').property('pk', 'pk').property(id, 'L').property('formula', 'L').
addV('ddn').property('pk', 'pk').property(id, 'M').property('formula', 'J+K').
addV('ddn').property('pk', 'pk').property(id, 'N').property('formula', 'N').
addV('ddn').property('pk', 'pk').property(id, 'O').property('formula', 'A+K').property('requires', "'A','K'").
addV('ddn').property('pk', 'pk').property(id, 'P').property('formula', 'L+M+N+O').property('requires', "'L','M','N','O'").
V('D').addE('needs').to(V('A')).
V('D').addE('needs').to(V('B')).
V('D').addE('needs').to(V('C')).
V('G').addE('needs').to(V('H')).
V('G').addE('needs').to(V('I')).
V('K').addE('needs').to(V('D')).
V('K').addE('needs').to(V('E')).
V('K').addE('needs').to(V('F')).
V('J').addE('needs').to(V('F')).
V('J').addE('needs').to(V('G')).
V('O').addE('needs').to(V('A')).
V('O').addE('needs').to(V('K')).
V('M').addE('needs').to(V('J')).
V('M').addE('needs').to(V('K')).
V('P').addE('needs').to(V('L')).
V('P').addE('needs').to(V('M')).
V('P').addE('needs').to(V('N')).
V('P').addE('needs').to(V('O'))

stephen mallette · Accepted Answer

I think the answer boils down to being able to sort the vertices traversed by their path length.

gremlin> g.V("A").
......1>   emit().repeat(__.in('needs')).path().
......2>   group().
......3>     by(tail(local)).
......4>     by(count(local).fold()).
......5>   order(local).
......6>     by(select(values).tail(local)).
......7>   select(keys)
==>[v[A],v[D],v[K],v[M],v[O],v[P]]

I group() by the last element in the path() and transform each path in the group to its length with count(local). That allows me to order() the results by the longest path for each vertex.

Note that I don't think you need until(__.inE().count().is(0)) because you're just traversing to path exhaustion in either case. Also, take care with __.inE().count().is(0) as you end up counting all the edges just to detect a count of zero. Most graphs should optimize that to just until(inE()), but it's always best to be explicit in my opinion. That said, you need to be sure of your data structures when using repeat() - it takes just one edge of bad data to send your traversal into an infinity of traversing. Consider some kind of upper bound to your repeat() that makes sense for your data so that the loop will terminate at some point.

Here is an alternative which actually might be better since it doesn't bother to hold all the counts in the Map after the group():

gremlin> g.V("A").
......1>   emit().repeat(__.in('needs')).path().
......2>   group().
......3>     by(tail(local)).
......4>     by(count(local).order(local).tail(local)).
......5>   order(local).
......6>     by(values).
......7>   select(keys)
==>[v[A],v[D],v[K],v[M],v[O],v[P]]

Gremlin query to get affected nodes from the Cosmos graph

Answers (1)

Related Questions