Wolfgang Fahl
Wolfgang Fahl

Reputation: 15594

Why do the API calls not work in Gremlin Python?

In gremlin-python I can do:

for e in g.E().toList():
        print(e)

and will get a result like

e[11][4-created->3]
e[12][6-created->3]
e[7][1-knows->2]
e[8][1-knows->4]
e[9][1-created->3]
e[10][4-created->5]

According to

http://tinkerpop.apache.org/javadocs/3.4.3/core/org/apache/tinkerpop/gremlin/structure/Edge.html

an Edge has a inVertex() accessor. Translating this idea to python leads to:

for e in g.E().toList():
        print (e.inVertex().id)

and the error

AttributeError: 'Edge' object has no attribute 'inVertex'

the same holds true for quite a few other "simple" API calls.

for e in g.E().toList():
        print(e.property('weight'))

also fails

What is this so and what is the workaround?

Upvotes: 0

Views: 1589

Answers (3)

Kfir Dadosh
Kfir Dadosh

Reputation: 1419

toList() execute the gremlin query and packs the result in a list. Thus, you cannot continue the traversal with inVertex().

To get the entering vertices you should run:

for v in g.E().inV().toList():
        print(v)

To get the edge properties and both vertices properties in a single query, you can use project:

g.E().project("values", "in", "out")
    .by(valueMap(true))
    .by(inV().valueMap(true))
    .by(outV().valueMap(true))

Upvotes: 1

stephen mallette
stephen mallette

Reputation: 46226

In TinkerPop graph elements (e.g. vertices, edges, vertex properties) often go through a process of "detachment". Gremlin traversals that return graph elements from remote sources go through this process and, in these cases, are typically detached to "references". A reference provides just enough information to re-attach to the remote graph. For that process of re-attachment it only needs id and label. Therefore, properties are not returned. It is the same for all languages that Gremlin supports, not just Python (though, I will contradict this statement a bit at the end in a final note).

Speaking specifically for Gremlin Language Variants, like Python, these implementations of Gremlin do not have a full Gremlin Virtual Machine to process traversals and it was never an intent to build full graph structures on the Python side - only graph elements with references to match what would be returned from remote sources. That also reduces the amount of code on the Python side that needs to be maintained because TinkerPop can rely on standard primitives like Dictionary, List etc. that exist in all programming languages.

Technical history aside, the return of references forces uses to write better Gremlin according to best practices. Users should specify exactly what data they want in their Gremlin traversal. Rather than:

g.V().hasLabel('customer')

you would prefer:

g.V().hasLabel('customer').valueMap(true,'name')

or in 3.4.4:

 g.V().hasLabel('customer').elementMap('name')

which returns a less nested structure than valueMap(). elementMap() works very nicely for edges and is a replacement for more complex approaches via project() to get the data you're requesting from an edge in your question:

gremlin> g.V().has('person','name','marko').elementMap()
==>[id:1,label:person,name:marko,age:29]
gremlin> g.V().has('person','name','marko').elementMap('name')
==>[id:1,label:person,name:marko]
gremlin> g.V().has('person','name','marko').properties('name').elementMap()
==>[id:0,key:name,value:marko]
gremlin> g.E(11).elementMap()
==>[id:11,label:created,IN:[id:3,label:software],OUT:[id:4,label:person],weight:0.4]

It's really no different in SQL where you likely wouldn't do:

SELECT * FROM customer

but instead:

SELECT name FROM customer

Returning references and forcing users to be a bit more explicit about what they return also solves a massive problem with multi/meta-properties. If a user returns vertices and inadvertently returns a "fat" vertex (e.g. a vertex with 1 million properties on it), it will have a significant impact to the server in trying to return that. By detaching to reference, there is no loophole for users to get stuck in.

All that said, as of 3.4.3, there are points of inconsistency with detachment still and in some cases in Java there are other ways that detachment works beyond reference detachment. TinkerPop has been trying to become completely consistent in this approach but have been trying to do it in a fashion that does not break existing code within existing release lines. This probably isn't the answer you're looking for, but at least it helps explain some of the reasoning and history for why things are as they are.

Upvotes: 3

Wolfgang Fahl
Wolfgang Fahl

Reputation: 15594

Looking at the source code at https://github.com/apache/tinkerpop/blob/master/gremlin-python/src/main/jython/gremlin_python/structure/graph.py (see below) the following properties are directly accessible:

for all elements:

e.id
e.label

for edges:

e.inV
e.outV

The bad news is that properties need first be retrieved so it is not so easy to access ids, labels and properties in a single python statement.

class Element(object):
    def __init__(self, id, label):
        self.id = id
        self.label = label

    def __eq__(self, other):
        return isinstance(other, self.__class__) and self.id == other.id

    def __hash__(self):
        return hash(self.id)


class Vertex(Element):
    def __init__(self, id, label="vertex"):
        Element.__init__(self, id, label)

    def __repr__(self):
        return "v[" + str(self.id) + "]"


class Edge(Element):
    def __init__(self, id, outV, label, inV):
        Element.__init__(self, id, label)
        self.outV = outV
        self.inV = inV

    def __repr__(self):
        return "e[" + str(self.id) + "][" + str(self.outV.id) + "-" + self.label + "->" + str(self.inV.id) + "]"

Upvotes: 0

Related Questions