ggendel
ggendel

Reputation: 413

ArangoDB efficient traversal via node attributes

In OrientDB, each vertex has connected edges attached. This means that it possible to explicitly walk nodes from a collection by using nested "select" statements.

As an example: Given a path of node attributes, find the matching end node. The path consists of a list of node attributes (for example, the kind is unique in the nodes in a path).

Now, suppose I had a tree:

kind=site, name=site1
  -- kind=project, name=project1
  -- kind=library, name=library1
kind=site, name=site2
  -- kind=project, name=project2
  -- kind=library, name=library1

A user wants the information from a library called library1 with the path:

[{kind=site},{kind=project,name=project1},{kind=library,name=library1}]

Not having each node fully qualified for traversal is okay as long as the result is a single node.

In OrientDB, the process would be:

This can be done in a nested select statement. The kind field is indexed so the starting nodes are collected quickly from a large number of objects. To further improve performance, I know which kinds are in which tables (collections) so I could scope the number of objects to select from (select from <table> where kind=site).

In ArrangoDB, only the edges have the node binding information so having a node I can't directly walk through a connected edge. In the GRAPH_TRAVERSAL function, I can specify the starting collection by example. So the example would be {kind=site}. Doesn't that mean the starting list of nodes has to be gleaned by scanning all of the graph edges, basically looking at every node connected to the input of each edge in the entire graph?

How would such a query be formulated (in AQL and/or arangojs) so it wouldn't have to start with a full scan of the objects connected to the edges?

I also don't see how to send an example vertex into the arangojs traversal function. It seems to always want an explicit starting vertex.

Upvotes: 3

Views: 567

Answers (1)

dothebart
dothebart

Reputation: 6067

With jobs like this in mind we designed the new AQL graph traversal which is going to be released with ArangoDB 2.8 (currently in late beta)

Regarding your last point, we identify the concerned collections using named graphs which know the collections and their relation in the graph. I'll assume you will be using named graphs therefore.

Lets use one of the ArangoDB 2.8 Example graphs with arangosh:

var examples = require("org/arangodb/graph-examples/example-graph.js");
var graph = examples.loadGraph("traversalGraph");

It has its its ventices in the circles collection, and its edges connecting the vertices in the edges collection:

db.circles.toArray();
db.edges.toArray();

We now can use FILTER statements in conjunction with the travesal depth to prune the traversal of a graph branch early in the execution.

Here we filter for an attribute of a vertex to be found in the first layer of the traversal:

db._query("FOR v, e, p IN 1..3 " + 
              "OUTBOUND 'circles/A' " + 
              "GRAPH 'traversalGraph' " + 
              "FILTER p.vertices[1]._key != 'G' RETURN v._key");

We also can filter for attributes of edges in a similar manner:

db._query("FOR v, e, p IN 1..3 "
          "OUTBOUND 'circles/A' " +
          "GRAPH 'traversalGraph' " + 
          "FILTER p.edges[0].label != 'right_foo' RETURN v._key");

The AQL graph traversal chapter also has an in depth explanation, in which way the traverser walks over the graph.

Upvotes: 1

Related Questions