Can someone explain what this graph travesal in Gremlin is doing?

Question

I'm having a bit of trouble understand these Gremlin queries:

from os import getenv
from gremlin_python.structure.graph import Graph
from gremlin_python.process.graph_traversal import __
from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection
 pmap = g.V().has(name, value) \
                .union(__.hasLabel('UID'), 
                       __.hasLabel('OID').outE('attached').inV()) \
                .union(__.propertyMap(),
                       __.inE('attached').outV().hasLabel('OID') \
                                         .propertyMap()).toList()

So I understand g.V().has(name, value) is looking for a vertex with the keyname = value. What is the union doing here? Is it unioning vertices with a label "OID" with edges that go outward with a label "attached"? What is theinV()` and why are the two arguments for union?

stephen mallette · Accepted Answer

The union() step just merges the child traversal streams that are provided to it as arguments. Take a more simple example:

gremlin> g = TinkerFactory.createModern().traversal()
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
gremlin> g.V().has('person','name','marko').union(has('age',29),bothE())
==>v[1]
==>e[9][1-created->3]
==>e[7][1-knows->2]
==>e[8][1-knows->4]
gremlin> g.V().has('person','name','marko').union(has('age',30),bothE())
==>e[9][1-created->3]
==>e[7][1-knows->2]
==>e[8][1-knows->4]

In the first example, we get union() takes in a "marko" vertex as the starting point for both has('age',29) and bothE(). As v[1] also has a "age" property with a value of "29" we see v[1] in the output. We also see all the edges of v[1] in merged into that stream of output. In the second traversal, we see v[1] being filtered out as the "age" is not equal to "30" so all we get are the edges.

With that explanation in mind, consider what the traversal you've included in your question is doing. It finds a vertex with a "name" and some value for that key. That becomes the start point for the first union(). If the vertex has a label of "UID" then it pass through. If the vertex has a label of "OID" then it traverses the outgoing "attached" edges to the adjacent vertex and returns those.

What's odd about that is the fact that a Vertex can only have one label (at least by TinkerPop's definition - some graphs support multiple element labels). So, assuming one label, you really only get one or the other stream. Personally, I don't think the use of union() is a good choice there. I think it would be more intuitive to use coalesce since only one stream can be returned, thus expanding my example from above:

gremlin> g.V().has('person','name','marko').coalesce(has('age',30),has('age',29).bothE())
==>e[9][1-created->3]
==>e[7][1-knows->2]
==>e[8][1-knows->4]
gremlin> g.V().has('person','name','marko').coalesce(has('age',29),has('age',29).bothE())
==>v[1]

The use of coalesce() makes the intent much more clear in my opinion. Following on further with the original code to the second union() - at this point, you either have the original Vertex or one or more "attached" vertices for which the traversal combines a propertyMap() and/or a propertyMap() of an additional "attached" vertices that have an "OID" label.

It's really hard to say exactly what the intent of this traversal is given the information provided. Depending on what the data structure is and what the intent is, I imagine that things could be simplified. Hopefully, I've at least explained what union() is doing and clarified that for you as that seemed to be the core of your question.

Can someone explain what this graph travesal in Gremlin is doing?

Answers (1)

Related Questions